2025-12-04T09:15:55.9077126Z Current runner version: '2.330.0' 2025-12-04T09:15:55.9082906Z Runner name: 'i-0f694664a515f0ebd' 2025-12-04T09:15:55.9083674Z Runner group name: 'default' 2025-12-04T09:15:55.9084551Z Machine name: 'ip-10-0-18-14' 2025-12-04T09:15:55.9087305Z ##[group]GITHUB_TOKEN Permissions 2025-12-04T09:15:55.9089461Z Contents: read 2025-12-04T09:15:55.9089974Z Metadata: read 2025-12-04T09:15:55.9090464Z ##[endgroup] 2025-12-04T09:15:55.9092393Z Secret source: Actions 2025-12-04T09:15:55.9093046Z Prepare workflow directory 2025-12-04T09:15:55.9605638Z Prepare all required actions 2025-12-04T09:15:55.9643085Z Getting action download info 2025-12-04T09:15:56.2863729Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd) 2025-12-04T09:15:58.7014998Z Download action repository 'pytorch/pytorch@main' (SHA:7716da9fb23f27a65b41f9f016a2afadf281c18f) 2025-12-04T09:16:14.9557511Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065) 2025-12-04T09:16:15.3701375Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-12-04T09:16:15.6139732Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-12-04T09:16:15.7957413Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:16:16.0372084Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T09:16:16.4034002Z Getting action download info 2025-12-04T09:16:16.5222593Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5) 2025-12-04T09:16:16.8089829Z Getting action download info 2025-12-04T09:16:16.9367573Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-12-04T09:16:17.1795883Z Getting action download info 2025-12-04T09:16:17.3150873Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-12-04T09:16:17.5240137Z Getting action download info 2025-12-04T09:16:17.6586135Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T09:16:17.6589854Z ##[group] Inputs 2025-12-04T09:16:17.6590261Z build-environment: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck 2025-12-04T09:16:17.6600090Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:16:17.6610734Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:16:17.6625433Z sync-tag: 2025-12-04T09:16:17.6626581Z timeout-minutes: 300 2025-12-04T09:16:17.6626830Z use-gha: 2025-12-04T09:16:17.6627034Z dashboard-tag: 2025-12-04T09:16:17.6627290Z s3-bucket: gha-artifacts 2025-12-04T09:16:17.6627572Z aws-role-to-assume: 2025-12-04T09:16:17.6628319Z disable-monitor: false 2025-12-04T09:16:17.6628615Z monitor-log-interval: 5 2025-12-04T09:16:17.6628934Z monitor-data-collect-interval: 1 2025-12-04T09:16:17.6629248Z ##[endgroup] 2025-12-04T09:16:17.6629961Z Complete job name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:16:17.7300420Z A job started hook has been configured by the self-hosted runner administrator 2025-12-04T09:16:17.7400795Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-12-04T09:16:17.7412339Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:16:17.7412939Z ##[endgroup] 2025-12-04T09:16:19.2751326Z Runner Type: linux.g5.4xlarge.nvidia.gpu 2025-12-04T09:16:19.2751790Z Instance Type: g5.4xlarge 2025-12-04T09:16:19.2752038Z AMI Name: unknown 2025-12-04T09:16:19.2803825Z AMI ID: ami-08982f1c5bf93d976 2025-12-04T09:16:24.7735937Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-12-04T09:16:24.7736364Z with: 2025-12-04T09:16:24.7736852Z github-secret: *** 2025-12-04T09:16:24.7737578Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2025-12-04T09:16:24.7738375Z activate-with-label: false 2025-12-04T09:16:24.7738646Z label: with-ssh 2025-12-04T09:16:24.7738879Z remove-existing-keys: true 2025-12-04T09:16:24.7739219Z fail-silently: true 2025-12-04T09:16:24.7739452Z env: 2025-12-04T09:16:24.7739640Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:16:24.7739915Z ##[endgroup] 2025-12-04T09:16:24.9131011Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-12-04T09:16:24.9132256Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-12-04T09:16:24.9315415Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-12-04T09:16:24.9315847Z with: 2025-12-04T09:16:24.9316054Z no-sudo: true 2025-12-04T09:16:24.9316282Z submodules: recursive 2025-12-04T09:16:24.9316536Z fetch-depth: 0 2025-12-04T09:16:24.9316965Z env: 2025-12-04T09:16:24.9317165Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:16:24.9317417Z ##[endgroup] 2025-12-04T09:16:24.9389287Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:16:24.9390278Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:16:24.9405450Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:16:24.9405857Z env: 2025-12-04T09:16:24.9406089Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:16:24.9406415Z ##[endgroup] 2025-12-04T09:16:24.9520887Z ##[group]Run # Use all available CPUs for fetching 2025-12-04T09:16:24.9521330Z # Use all available CPUs for fetching 2025-12-04T09:16:24.9521672Z cd "${GITHUB_WORKSPACE}" 2025-12-04T09:16:24.9522005Z git config --global fetch.parallel 0 2025-12-04T09:16:24.9522396Z git config --global submodule.fetchJobs 0 2025-12-04T09:16:24.9522751Z  2025-12-04T09:16:24.9523104Z # Clean workspace. The default checkout action should also do this, but 2025-12-04T09:16:24.9523584Z # do it here as well just in case 2025-12-04T09:16:24.9523900Z if [[ -d .git ]]; then 2025-12-04T09:16:24.9524181Z  if [ -z "${NO_SUDO}" ]; then 2025-12-04T09:16:24.9524489Z  sudo git clean -ffdx 2025-12-04T09:16:24.9524761Z  else 2025-12-04T09:16:24.9524980Z  git clean -ffdx 2025-12-04T09:16:24.9525232Z  fi 2025-12-04T09:16:24.9525434Z fi 2025-12-04T09:16:24.9534771Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:16:24.9535145Z env: 2025-12-04T09:16:24.9535408Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:16:24.9535676Z NO_SUDO: true 2025-12-04T09:16:24.9535888Z ##[endgroup] 2025-12-04T09:16:24.9675994Z ##[group]Run actions/checkout@v4 2025-12-04T09:16:24.9676276Z with: 2025-12-04T09:16:24.9676536Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:16:24.9676874Z fetch-depth: 0 2025-12-04T09:16:24.9677097Z submodules: recursive 2025-12-04T09:16:24.9677343Z show-progress: false 2025-12-04T09:16:24.9677605Z repository: pytorch/pytorch 2025-12-04T09:16:24.9677979Z token: *** 2025-12-04T09:16:24.9678191Z ssh-strict: true 2025-12-04T09:16:24.9678420Z ssh-user: git 2025-12-04T09:16:24.9678655Z persist-credentials: true 2025-12-04T09:16:24.9678957Z clean: true 2025-12-04T09:16:24.9679224Z sparse-checkout-cone-mode: true 2025-12-04T09:16:24.9679518Z fetch-tags: false 2025-12-04T09:16:24.9679742Z lfs: false 2025-12-04T09:16:24.9679963Z set-safe-directory: true 2025-12-04T09:16:24.9680236Z env: 2025-12-04T09:16:24.9680433Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:16:24.9680678Z ##[endgroup] 2025-12-04T09:16:25.0772186Z Syncing repository: pytorch/pytorch 2025-12-04T09:16:25.0773537Z ##[group]Getting Git version info 2025-12-04T09:16:25.0774008Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T09:16:25.0774683Z [command]/usr/bin/git version 2025-12-04T09:16:25.0973377Z git version 2.50.1 2025-12-04T09:16:25.0999017Z ##[endgroup] 2025-12-04T09:16:25.1009978Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/35b33208-1641-45ab-8ee2-11b904f686c5/.gitconfig' 2025-12-04T09:16:25.1075037Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/35b33208-1641-45ab-8ee2-11b904f686c5' before making global git config changes 2025-12-04T09:16:25.1076077Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T09:16:25.1080646Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:16:25.1137046Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T09:16:25.1140707Z ##[group]Initializing the repository 2025-12-04T09:16:25.1145212Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:16:25.1224863Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-12-04T09:16:25.1225502Z hint: is subject to change. To configure the initial branch name to use in all 2025-12-04T09:16:25.1226076Z hint: of your new repositories, which will suppress this warning, call: 2025-12-04T09:16:25.1226491Z hint: 2025-12-04T09:16:25.1226783Z hint: git config --global init.defaultBranch 2025-12-04T09:16:25.1227137Z hint: 2025-12-04T09:16:25.1227468Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-12-04T09:16:25.1228055Z hint: 'development'. The just-created branch can be renamed via this command: 2025-12-04T09:16:25.1228490Z hint: 2025-12-04T09:16:25.1228700Z hint: git branch -m 2025-12-04T09:16:25.1228952Z hint: 2025-12-04T09:16:25.1229323Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-12-04T09:16:25.1235361Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2025-12-04T09:16:25.1248141Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-12-04T09:16:25.1297716Z ##[endgroup] 2025-12-04T09:16:25.1298150Z ##[group]Disabling automatic garbage collection 2025-12-04T09:16:25.1301603Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T09:16:25.1335866Z ##[endgroup] 2025-12-04T09:16:25.1336252Z ##[group]Setting up auth 2025-12-04T09:16:25.1342391Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T09:16:25.1376940Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T09:16:25.1808527Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T09:16:25.1842660Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T09:16:25.2235948Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T09:16:25.2272370Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T09:16:25.2665003Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T09:16:25.2716400Z ##[endgroup] 2025-12-04T09:16:25.2717045Z ##[group]Fetching the repository 2025-12-04T09:16:25.2724372Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T09:17:17.9961311Z From https://github.com/pytorch/pytorch 2025-12-04T09:17:17.9962044Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-12-04T09:17:17.9962754Z * [new branch] 2.9.1 -> origin/2.9.1 2025-12-04T09:17:17.9963348Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-12-04T09:17:17.9964279Z * [new branch] Flamefire-patch-1 -> origin/Flamefire-patch-1 2025-12-04T09:17:17.9964920Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-12-04T09:17:17.9966578Z * [new branch] HOPrintFunc -> origin/HOPrintFunc 2025-12-04T09:17:17.9970116Z * [new branch] IvanKobzarev/stack/1 -> origin/IvanKobzarev/stack/1 2025-12-04T09:17:17.9972944Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-12-04T09:17:17.9975420Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-12-04T09:17:17.9977085Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-12-04T09:17:17.9978844Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-12-04T09:17:17.9981013Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-12-04T09:17:17.9982698Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-12-04T09:17:17.9985027Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-12-04T09:17:17.9986538Z * [new branch] VLA_exp -> origin/VLA_exp 2025-12-04T09:17:17.9988796Z * [new branch] activation_bench -> origin/activation_bench 2025-12-04T09:17:17.9990618Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-12-04T09:17:17.9993246Z * [new branch] adi/onednn_aarch64 -> origin/adi/onednn_aarch64 2025-12-04T09:17:17.9995032Z * [new branch] adi/test -> origin/adi/test 2025-12-04T09:17:17.9997038Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-12-04T09:17:17.9998876Z * [new branch] adi/test_m8g -> origin/adi/test_m8g 2025-12-04T09:17:18.0000668Z * [new branch] adi/test_onednn -> origin/adi/test_onednn 2025-12-04T09:17:18.0002531Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-12-04T09:17:18.0004279Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-12-04T09:17:18.0006121Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-12-04T09:17:18.0008724Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-12-04T09:17:18.0014462Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-12-04T09:17:18.0015154Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-12-04T09:17:18.0017696Z * [new branch] albanD-patch-1 -> origin/albanD-patch-1 2025-12-04T09:17:18.0019714Z * [new branch] also-surround-shimh -> origin/also-surround-shimh 2025-12-04T09:17:18.0022461Z * [new branch] angelayi/aot_compile -> origin/angelayi/aot_compile 2025-12-04T09:17:18.0024424Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-12-04T09:17:18.0026268Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-12-04T09:17:18.0028319Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-12-04T09:17:18.0029784Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-12-04T09:17:18.0031837Z * [new branch] angelayi/inductor_const -> origin/angelayi/inductor_const 2025-12-04T09:17:18.0033642Z * [new branch] angelayi/lstm -> origin/angelayi/lstm 2025-12-04T09:17:18.0036286Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-12-04T09:17:18.0038813Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-12-04T09:17:18.0040735Z * [new branch] angelayi/side_eff -> origin/angelayi/side_eff 2025-12-04T09:17:18.0042717Z * [new branch] angelayi/state_dict -> origin/angelayi/state_dict 2025-12-04T09:17:18.0044844Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-12-04T09:17:18.0046863Z * [new branch] angelayi/symm_mem -> origin/angelayi/symm_mem 2025-12-04T09:17:18.0048697Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-12-04T09:17:18.0051253Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-12-04T09:17:18.0053137Z * [new branch] annotate_assert -> origin/annotate_assert 2025-12-04T09:17:18.0055226Z * [new branch] annotate_fallback_kernel -> origin/annotate_fallback_kernel 2025-12-04T09:17:18.0057292Z * [new branch] annotation_deepcopy -> origin/annotation_deepcopy 2025-12-04T09:17:18.0059176Z * [new branch] annotation_dynamo -> origin/annotation_dynamo 2025-12-04T09:17:18.0061115Z * [new branch] aot_eager_stack_trace -> origin/aot_eager_stack_trace 2025-12-04T09:17:18.0062977Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-12-04T09:17:18.0064879Z * [new branch] aoti_const_device -> origin/aoti_const_device 2025-12-04T09:17:18.0066773Z * [new branch] aoti_fqn_name_interface -> origin/aoti_fqn_name_interface 2025-12-04T09:17:18.0068626Z * [new branch] aoti_package_weights_binary -> origin/aoti_package_weights_binary 2025-12-04T09:17:18.0070435Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-12-04T09:17:18.0073805Z * [new branch] arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling 2025-12-04T09:17:18.0075646Z * [new branch] async_tp -> origin/async_tp 2025-12-04T09:17:18.0077691Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-12-04T09:17:18.0079665Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-12-04T09:17:18.0081723Z * [new branch] atalman-patch-2 -> origin/atalman-patch-2 2025-12-04T09:17:18.0102889Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-12-04T09:17:18.0103509Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-12-04T09:17:18.0104069Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-12-04T09:17:18.0104615Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-12-04T09:17:18.0105175Z * [new branch] atalman-patch-7 -> origin/atalman-patch-7 2025-12-04T09:17:18.0105693Z * [new branch] atalman-patch-8 -> origin/atalman-patch-8 2025-12-04T09:17:18.0106332Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-12-04T09:17:18.0107081Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-12-04T09:17:18.0108063Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-12-04T09:17:18.0108760Z * [new branch] attention_benchmarking_clean -> origin/attention_benchmarking_clean 2025-12-04T09:17:18.0109424Z * [new branch] bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add 2025-12-04T09:17:18.0110181Z * [new branch] bahuang/fix_debug_mode -> origin/bahuang/fix_debug_mode 2025-12-04T09:17:18.0110851Z * [new branch] bahuang/fix_expand -> origin/bahuang/fix_expand 2025-12-04T09:17:18.0111366Z * [new branch] bahuang/test -> origin/bahuang/test 2025-12-04T09:17:18.0112368Z * [new branch] base/1.5 -> origin/base/1.5 2025-12-04T09:17:18.0114873Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-12-04T09:17:18.0116288Z * [new branch] bench_scaled_mm_ops -> origin/bench_scaled_mm_ops 2025-12-04T09:17:18.0118746Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-12-04T09:17:18.0120209Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-12-04T09:17:18.0123113Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-12-04T09:17:18.0125719Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-12-04T09:17:18.0128257Z * [new branch] bf/bug-static-input -> origin/bf/bug-static-input 2025-12-04T09:17:18.0129743Z * [new branch] bf/cg-backend -> origin/bf/cg-backend 2025-12-04T09:17:18.0131732Z * [new branch] bf/cg-nccl-test -> origin/bf/cg-nccl-test 2025-12-04T09:17:18.0133579Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-12-04T09:17:18.0135574Z * [new branch] bf/clean-torchbench-hf -> origin/bf/clean-torchbench-hf 2025-12-04T09:17:18.0137100Z * [new branch] bf/combo-debug-log -> origin/bf/combo-debug-log 2025-12-04T09:17:18.0139089Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-12-04T09:17:18.0141722Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-12-04T09:17:18.0143663Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-12-04T09:17:18.0145123Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-12-04T09:17:18.0147367Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-12-04T09:17:18.0149314Z * [new branch] bf/dynamo-partition -> origin/bf/dynamo-partition 2025-12-04T09:17:18.0151154Z * [new branch] bf/lite -> origin/bf/lite 2025-12-04T09:17:18.0153076Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-12-04T09:17:18.0155175Z * [new branch] bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols 2025-12-04T09:17:18.0157624Z * [new branch] bf/partition-memory-plan -> origin/bf/partition-memory-plan 2025-12-04T09:17:18.0159492Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-12-04T09:17:18.0161455Z * [new branch] bf/partition-view-fallback -> origin/bf/partition-view-fallback 2025-12-04T09:17:18.0163077Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-12-04T09:17:18.0165091Z * [new branch] bf/timm-nov-26-2025 -> origin/bf/timm-nov-26-2025 2025-12-04T09:17:18.0167038Z * [new branch] bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3 2025-12-04T09:17:18.0168975Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-12-04T09:17:18.0170613Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-12-04T09:17:18.0172529Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-12-04T09:17:18.0174448Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-12-04T09:17:18.0176282Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-12-04T09:17:18.0178201Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-12-04T09:17:18.0179855Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-12-04T09:17:18.0181857Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-12-04T09:17:18.0183825Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-12-04T09:17:18.0186045Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-12-04T09:17:18.0187531Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-12-04T09:17:18.0189506Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-12-04T09:17:18.0191257Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-12-04T09:17:18.0193161Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-12-04T09:17:18.0194907Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-12-04T09:17:18.0196837Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-12-04T09:17:18.0199400Z * [new branch] brister/fx_device_type -> origin/brister/fx_device_type 2025-12-04T09:17:18.0201262Z * [new branch] brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx 2025-12-04T09:17:18.0202978Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-12-04T09:17:18.0204858Z * [new branch] bwd-backup -> origin/bwd-backup 2025-12-04T09:17:18.0206898Z * [new branch] c57382a49 -> origin/c57382a49 2025-12-04T09:17:18.0208868Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-12-04T09:17:18.0210859Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-12-04T09:17:18.0213550Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-12-04T09:17:18.0215478Z * [new branch] cccclai-patch-1 -> origin/cccclai-patch-1 2025-12-04T09:17:18.0217576Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0219399Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0221944Z * [new branch] cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0223633Z * [new branch] cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0225789Z * [new branch] cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0227911Z * [new branch] cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0229573Z * [new branch] cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0231671Z * [new branch] cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0233753Z * [new branch] cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0235521Z * [new branch] cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0237659Z * [new branch] cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0239366Z * [new branch] cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0241174Z * [new branch] cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0243359Z * [new branch] cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0245304Z * [new branch] cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0247025Z * [new branch] cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0249198Z * [new branch] cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0251021Z * [new branch] cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0253118Z * [new branch] cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_ 2025-12-04T09:17:18.0254715Z * [new branch] cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040 2025-12-04T09:17:18.0256802Z * [new branch] cherry_pick_166457 -> origin/cherry_pick_166457 2025-12-04T09:17:18.0258756Z * [new branch] cherrypick_166338 -> origin/cherrypick_166338 2025-12-04T09:17:18.0260887Z * [new branch] cherrypick_166458 -> origin/cherrypick_166458 2025-12-04T09:17:18.0262435Z * [new branch] cherrypick_166586 -> origin/cherrypick_166586 2025-12-04T09:17:18.0264472Z * [new branch] cherrypick_166956 -> origin/cherrypick_166956 2025-12-04T09:17:18.0266386Z * [new branch] ci_attn -> origin/ci_attn 2025-12-04T09:17:18.0268283Z * [new branch] codex-testing -> origin/codex-testing 2025-12-04T09:17:18.0271273Z * [new branch] codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions 2025-12-04T09:17:18.0272604Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-12-04T09:17:18.0275320Z * [new branch] codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id 2025-12-04T09:17:18.0277326Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-12-04T09:17:18.0278836Z * [new branch] compatiblpy39util -> origin/compatiblpy39util 2025-12-04T09:17:18.0280888Z * [new branch] cond_hop_device -> origin/cond_hop_device 2025-12-04T09:17:18.0282796Z * [new branch] context_test -> origin/context_test 2025-12-04T09:17:18.0285609Z * [new branch] copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip 2025-12-04T09:17:18.0287993Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-12-04T09:17:18.0289978Z * [new branch] cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade 2025-12-04T09:17:18.0292658Z * [new branch] crpa/typo-in-inductor_comm_lowering -> origin/crpa/typo-in-inductor_comm_lowering 2025-12-04T09:17:18.0295087Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-12-04T09:17:18.0296630Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-12-04T09:17:18.0298568Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-12-04T09:17:18.0300523Z * [new branch] csl/clean_up -> origin/csl/clean_up 2025-12-04T09:17:18.0302372Z * [new branch] csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit 2025-12-04T09:17:18.0303869Z * [new branch] csl/katex -> origin/csl/katex 2025-12-04T09:17:18.0306138Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-12-04T09:17:18.0308546Z * [new branch] csl/lint_testing -> origin/csl/lint_testing 2025-12-04T09:17:18.0310839Z * [new branch] csl/lint_thing -> origin/csl/lint_thing 2025-12-04T09:17:18.0313014Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-12-04T09:17:18.0314557Z * [new branch] csl/manually_gen_json -> origin/csl/manually_gen_json 2025-12-04T09:17:18.0316563Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-12-04T09:17:18.0318277Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-12-04T09:17:18.0320340Z * [new branch] csl/print_timing -> origin/csl/print_timing 2025-12-04T09:17:18.0322250Z * [new branch] csl/remove_experiment -> origin/csl/remove_experiment 2025-12-04T09:17:18.0324211Z * [new branch] csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var 2025-12-04T09:17:18.0326258Z * [new branch] csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel 2025-12-04T09:17:18.0327872Z * [new branch] csl/remove_run_parallel -> origin/csl/remove_run_parallel 2025-12-04T09:17:18.0329758Z * [new branch] csl/remove_unused_vars -> origin/csl/remove_unused_vars 2025-12-04T09:17:18.0331632Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-12-04T09:17:18.0333480Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-12-04T09:17:18.0335393Z * [new branch] csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs 2025-12-04T09:17:18.0337172Z * [new branch] csl/td_job_level -> origin/csl/td_job_level 2025-12-04T09:17:18.0339207Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-12-04T09:17:18.0341324Z * [new branch] csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn 2025-12-04T09:17:18.0342879Z * [new branch] csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence 2025-12-04T09:17:18.0345203Z * [new branch] csl/upload_json_running -> origin/csl/upload_json_running 2025-12-04T09:17:18.0346478Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-12-04T09:17:18.0348443Z * [new branch] csl/xml_stuff -> origin/csl/xml_stuff 2025-12-04T09:17:18.0350401Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-12-04T09:17:18.0352322Z * [new branch] cuda_mempool -> origin/cuda_mempool 2025-12-04T09:17:18.0354168Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-12-04T09:17:18.0356750Z * [new branch] d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace 2025-12-04T09:17:18.0359332Z * [new branch] daxia6/2.8o3 -> origin/daxia6/2.8o3 2025-12-04T09:17:18.0361254Z * [new branch] debug-guard -> origin/debug-guard 2025-12-04T09:17:18.0363208Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-12-04T09:17:18.0369368Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 2025-12-04T09:17:18.0371240Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 2025-12-04T09:17:18.0373517Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-12-04T09:17:18.0375212Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-12-04T09:17:18.0378360Z * [new branch] dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt 2025-12-04T09:17:18.0381533Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-12-04T09:17:18.0383285Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-12-04T09:17:18.0385454Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-12-04T09:17:18.0387323Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-12-04T09:17:18.0389068Z * [new branch] dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest 2025-12-04T09:17:18.0391341Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-12-04T09:17:18.0393436Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-12-04T09:17:18.0395858Z * [new branch] dev/joona/scalar_clamp -> origin/dev/joona/scalar_clamp 2025-12-04T09:17:18.0398198Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-12-04T09:17:18.0400859Z * [new branch] dev/joona/sdpa_api -> origin/dev/joona/sdpa_api 2025-12-04T09:17:18.0402931Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-12-04T09:17:18.0405210Z * [new branch] dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose 2025-12-04T09:17:18.0407036Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-12-04T09:17:18.0408804Z * [new branch] disp_counter -> origin/disp_counter 2025-12-04T09:17:18.0411177Z * [new branch] divyanshk-patch-1 -> origin/divyanshk-patch-1 2025-12-04T09:17:18.0412952Z * [new branch] docs -> origin/docs 2025-12-04T09:17:18.0414898Z * [new branch] documentation -> origin/documentation 2025-12-04T09:17:18.0416754Z * [new branch] eager_model_benchmarks -> origin/eager_model_benchmarks 2025-12-04T09:17:18.0419884Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-12-04T09:17:18.0421437Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-12-04T09:17:18.0423040Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-12-04T09:17:18.0425137Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-12-04T09:17:18.0427083Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-12-04T09:17:18.0428991Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-12-04T09:17:18.0430961Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-12-04T09:17:18.0432821Z * [new branch] eqy-patch-5 -> origin/eqy-patch-5 2025-12-04T09:17:18.0434673Z * [new branch] eqy-patch-6 -> origin/eqy-patch-6 2025-12-04T09:17:18.0437188Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-12-04T09:17:18.0439259Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-12-04T09:17:18.0440609Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-12-04T09:17:18.0442679Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-12-04T09:17:18.0444452Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-12-04T09:17:18.0446801Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-12-04T09:17:18.0449188Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-12-04T09:17:18.0450633Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-12-04T09:17:18.0453088Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-12-04T09:17:18.0454551Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-12-04T09:17:18.0456359Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-12-04T09:17:18.0458671Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-12-04T09:17:18.0460302Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-12-04T09:17:18.0462314Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-12-04T09:17:18.0464426Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-12-04T09:17:18.0466041Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-12-04T09:17:18.0468512Z * [new branch] exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization 2025-12-04T09:17:18.0470233Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-12-04T09:17:18.0472525Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-12-04T09:17:18.0474627Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-12-04T09:17:18.0476154Z * [new branch] exec -> origin/exec 2025-12-04T09:17:18.0478390Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-12-04T09:17:18.0480383Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-12-04T09:17:18.0482291Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-12-04T09:17:18.0484289Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-12-04T09:17:18.0486126Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-12-04T09:17:18.0487950Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-12-04T09:17:18.0489864Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-12-04T09:17:18.0491824Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-12-04T09:17:18.0493699Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-12-04T09:17:18.0495514Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-12-04T09:17:18.0497352Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-12-04T09:17:18.0499503Z * [new branch] export-D82250826 -> origin/export-D82250826 2025-12-04T09:17:18.0501439Z * [new branch] export-D82253817 -> origin/export-D82253817 2025-12-04T09:17:18.0503393Z * [new branch] export-D83541846 -> origin/export-D83541846 2025-12-04T09:17:18.0505345Z * [new branch] export-D83627170 -> origin/export-D83627170 2025-12-04T09:17:18.0507246Z * [new branch] export-D83766701 -> origin/export-D83766701 2025-12-04T09:17:18.0509384Z * [new branch] export-D83768878 -> origin/export-D83768878 2025-12-04T09:17:18.0511246Z * [new branch] export-D83769447 -> origin/export-D83769447 2025-12-04T09:17:18.0513078Z * [new branch] export-D84089824 -> origin/export-D84089824 2025-12-04T09:17:18.0514967Z * [new branch] export-D84213020 -> origin/export-D84213020 2025-12-04T09:17:18.0517345Z * [new branch] export-D84373821 -> origin/export-D84373821 2025-12-04T09:17:18.0519610Z * [new branch] export-D84612194 -> origin/export-D84612194 2025-12-04T09:17:18.0521372Z * [new branch] export-D84890985 -> origin/export-D84890985 2025-12-04T09:17:18.0523301Z * [new branch] export-D85122326 -> origin/export-D85122326 2025-12-04T09:17:18.0525336Z * [new branch] export-D86256198 -> origin/export-D86256198 2025-12-04T09:17:18.0527165Z * [new branch] export-D86460608 -> origin/export-D86460608 2025-12-04T09:17:18.0529165Z * [new branch] export-D86474796 -> origin/export-D86474796 2025-12-04T09:17:18.0531216Z * [new branch] export-D86712396 -> origin/export-D86712396 2025-12-04T09:17:18.0533102Z * [new branch] export-D87022129 -> origin/export-D87022129 2025-12-04T09:17:18.0535048Z * [new branch] export-D87838959 -> origin/export-D87838959 2025-12-04T09:17:18.0537000Z * [new branch] export-D88319437 -> origin/export-D88319437 2025-12-04T09:17:18.0539216Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-12-04T09:17:18.0541164Z * [new branch] ezyang-titan-october -> origin/ezyang-titan-october 2025-12-04T09:17:18.0542772Z * [new branch] ezyang-titan-october2 -> origin/ezyang-titan-october2 2025-12-04T09:17:18.0544696Z * [new branch] ezyang-war -> origin/ezyang-war 2025-12-04T09:17:18.0547144Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-12-04T09:17:18.0549285Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-12-04T09:17:18.0551908Z * [new branch] fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm 2025-12-04T09:17:18.0553738Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-12-04T09:17:18.0556323Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-12-04T09:17:18.0558262Z * [new branch] fca -> origin/fca 2025-12-04T09:17:18.0560174Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-12-04T09:17:18.0561960Z * [new branch] fca5 -> origin/fca5 2025-12-04T09:17:18.0564534Z * [new branch] feature/justknobs-cpp -> origin/feature/justknobs-cpp 2025-12-04T09:17:18.0566461Z * [new branch] feature/numa-forkserver -> origin/feature/numa-forkserver 2025-12-04T09:17:18.0568704Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-12-04T09:17:18.0570545Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-12-04T09:17:18.0573126Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-12-04T09:17:18.0574970Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-12-04T09:17:18.0576786Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-12-04T09:17:18.0578286Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-12-04T09:17:18.0580389Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-12-04T09:17:18.0582364Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-12-04T09:17:18.0583783Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-12-04T09:17:18.0585682Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-12-04T09:17:18.0587699Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-12-04T09:17:18.0589713Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-12-04T09:17:18.0591230Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-12-04T09:17:18.0593397Z * [new branch] fix_addmm_issue -> origin/fix_addmm_issue 2025-12-04T09:17:18.0595376Z * [new branch] fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims 2025-12-04T09:17:18.0596992Z * [new branch] fix_bench_bwd_pass -> origin/fix_bench_bwd_pass 2025-12-04T09:17:18.0599034Z * [new branch] fix_mem_profiler_config -> origin/fix_mem_profiler_config 2025-12-04T09:17:18.0600834Z * [new branch] fix_nvrtc_discovery -> origin/fix_nvrtc_discovery 2025-12-04T09:17:18.0602672Z * [new branch] fix_op_runner -> origin/fix_op_runner 2025-12-04T09:17:18.0604536Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-12-04T09:17:18.0606482Z * [new branch] fixes-triage -> origin/fixes-triage 2025-12-04T09:17:18.0608882Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-12-04T09:17:18.0610714Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-12-04T09:17:18.0612503Z * [new branch] flex-flash -> origin/flex-flash 2025-12-04T09:17:18.0614539Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-12-04T09:17:18.0616358Z * [new branch] flex_flash -> origin/flex_flash 2025-12-04T09:17:18.0619124Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-12-04T09:17:18.0620809Z * [new branch] fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler 2025-12-04T09:17:18.0622682Z * [new branch] forkserver_fix -> origin/forkserver_fix 2025-12-04T09:17:18.0624691Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-12-04T09:17:18.0626719Z * [new branch] fx_cpp -> origin/fx_cpp 2025-12-04T09:17:18.0629237Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-12-04T09:17:18.0631245Z * [new branch] galv-patch-1 -> origin/galv-patch-1 2025-12-04T09:17:18.0634060Z * [new branch] galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4 2025-12-04T09:17:18.0636544Z * [new branch] georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch 2025-12-04T09:17:18.0640321Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-12-04T09:17:18.0642213Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-12-04T09:17:18.0645386Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-12-04T09:17:18.0647219Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-12-04T09:17:18.0650669Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-12-04T09:17:18.0652512Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-12-04T09:17:18.0655615Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-12-04T09:17:18.0657414Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-12-04T09:17:18.0659481Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-12-04T09:17:18.0661983Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-12-04T09:17:18.0663755Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-12-04T09:17:18.0665620Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-12-04T09:17:18.0668441Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-12-04T09:17:18.0669815Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-12-04T09:17:18.0671756Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-12-04T09:17:18.0674258Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-12-04T09:17:18.0676105Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-12-04T09:17:18.0677915Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-12-04T09:17:18.0680600Z * [new branch] gh/H-Huang/226/base -> origin/gh/H-Huang/226/base 2025-12-04T09:17:18.0682435Z * [new branch] gh/H-Huang/226/head -> origin/gh/H-Huang/226/head 2025-12-04T09:17:18.0684263Z * [new branch] gh/H-Huang/226/orig -> origin/gh/H-Huang/226/orig 2025-12-04T09:17:18.0687325Z * [new branch] gh/H-Huang/228/base -> origin/gh/H-Huang/228/base 2025-12-04T09:17:18.0689179Z * [new branch] gh/H-Huang/228/head -> origin/gh/H-Huang/228/head 2025-12-04T09:17:18.0691014Z * [new branch] gh/H-Huang/228/orig -> origin/gh/H-Huang/228/orig 2025-12-04T09:17:18.0694204Z * [new branch] gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base 2025-12-04T09:17:18.0695778Z * [new branch] gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head 2025-12-04T09:17:18.0697764Z * [new branch] gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig 2025-12-04T09:17:18.0700579Z * [new branch] gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base 2025-12-04T09:17:18.0702441Z * [new branch] gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head 2025-12-04T09:17:18.0704589Z * [new branch] gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig 2025-12-04T09:17:18.0706953Z * [new branch] gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base 2025-12-04T09:17:18.0709029Z * [new branch] gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head 2025-12-04T09:17:18.0710955Z * [new branch] gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig 2025-12-04T09:17:18.0713479Z * [new branch] gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base 2025-12-04T09:17:18.0715436Z * [new branch] gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head 2025-12-04T09:17:18.0717030Z * [new branch] gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig 2025-12-04T09:17:18.0719814Z * [new branch] gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base 2025-12-04T09:17:18.0721675Z * [new branch] gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head 2025-12-04T09:17:18.0723841Z * [new branch] gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig 2025-12-04T09:17:18.0726509Z * [new branch] gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base 2025-12-04T09:17:18.0728161Z * [new branch] gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head 2025-12-04T09:17:18.0730113Z * [new branch] gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig 2025-12-04T09:17:18.0732750Z * [new branch] gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base 2025-12-04T09:17:18.0734276Z * [new branch] gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head 2025-12-04T09:17:18.0736253Z * [new branch] gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig 2025-12-04T09:17:18.0738816Z * [new branch] gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base 2025-12-04T09:17:18.0741027Z * [new branch] gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head 2025-12-04T09:17:18.0742439Z * [new branch] gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig 2025-12-04T09:17:18.0745074Z * [new branch] gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base 2025-12-04T09:17:18.0746750Z * [new branch] gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head 2025-12-04T09:17:18.0748681Z * [new branch] gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig 2025-12-04T09:17:18.0751265Z * [new branch] gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base 2025-12-04T09:17:18.0752794Z * [new branch] gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head 2025-12-04T09:17:18.0754704Z * [new branch] gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig 2025-12-04T09:17:18.0757513Z * [new branch] gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base 2025-12-04T09:17:18.0759128Z * [new branch] gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head 2025-12-04T09:17:18.0761127Z * [new branch] gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig 2025-12-04T09:17:18.0763931Z * [new branch] gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base 2025-12-04T09:17:18.0765824Z * [new branch] gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head 2025-12-04T09:17:18.0767438Z * [new branch] gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig 2025-12-04T09:17:18.0770211Z * [new branch] gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base 2025-12-04T09:17:18.0772103Z * [new branch] gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head 2025-12-04T09:17:18.0773706Z * [new branch] gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig 2025-12-04T09:17:18.0776371Z * [new branch] gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base 2025-12-04T09:17:18.0778334Z * [new branch] gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head 2025-12-04T09:17:18.0780275Z * [new branch] gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig 2025-12-04T09:17:18.0782814Z * [new branch] gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base 2025-12-04T09:17:18.0784682Z * [new branch] gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head 2025-12-04T09:17:18.0786913Z * [new branch] gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig 2025-12-04T09:17:18.0790151Z * [new branch] gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base 2025-12-04T09:17:18.0792055Z * [new branch] gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head 2025-12-04T09:17:18.0793640Z * [new branch] gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig 2025-12-04T09:17:18.0796676Z * [new branch] gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base 2025-12-04T09:17:18.0798596Z * [new branch] gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head 2025-12-04T09:17:18.0800440Z * [new branch] gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig 2025-12-04T09:17:18.0803175Z * [new branch] gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base 2025-12-04T09:17:18.0805058Z * [new branch] gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head 2025-12-04T09:17:18.0806931Z * [new branch] gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig 2025-12-04T09:17:18.0809753Z * [new branch] gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base 2025-12-04T09:17:18.0811317Z * [new branch] gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head 2025-12-04T09:17:18.0813481Z * [new branch] gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig 2025-12-04T09:17:18.0816077Z * [new branch] gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base 2025-12-04T09:17:18.0817696Z * [new branch] gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head 2025-12-04T09:17:18.0819850Z * [new branch] gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig 2025-12-04T09:17:18.0822681Z * [new branch] gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base 2025-12-04T09:17:18.0824519Z * [new branch] gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head 2025-12-04T09:17:18.0836118Z * [new branch] gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig 2025-12-04T09:17:18.0836937Z * [new branch] gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base 2025-12-04T09:17:18.0837743Z * [new branch] gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head 2025-12-04T09:17:18.0838477Z * [new branch] gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig 2025-12-04T09:17:18.0839182Z * [new branch] gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base 2025-12-04T09:17:18.0839969Z * [new branch] gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head 2025-12-04T09:17:18.0840582Z * [new branch] gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig 2025-12-04T09:17:18.0841763Z * [new branch] gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base 2025-12-04T09:17:18.0843838Z * [new branch] gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head 2025-12-04T09:17:18.0845782Z * [new branch] gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig 2025-12-04T09:17:18.0848868Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-12-04T09:17:18.0850798Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-12-04T09:17:18.0853182Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-12-04T09:17:18.0854777Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-12-04T09:17:18.0857689Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-12-04T09:17:18.0859971Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-12-04T09:17:18.0862273Z * [new branch] gh/NikhilAPatel/5/base -> origin/gh/NikhilAPatel/5/base 2025-12-04T09:17:18.0864154Z * [new branch] gh/NikhilAPatel/5/head -> origin/gh/NikhilAPatel/5/head 2025-12-04T09:17:18.0866083Z * [new branch] gh/NikhilAPatel/5/orig -> origin/gh/NikhilAPatel/5/orig 2025-12-04T09:17:18.0869113Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-12-04T09:17:18.0870940Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-12-04T09:17:18.0872768Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-12-04T09:17:18.0875352Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-12-04T09:17:18.0877202Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-12-04T09:17:18.0879103Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-12-04T09:17:18.0881603Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-12-04T09:17:18.0883425Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-12-04T09:17:18.0885276Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-12-04T09:17:18.0887798Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-12-04T09:17:18.0889792Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-12-04T09:17:18.0891276Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-12-04T09:17:18.0893887Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-12-04T09:17:18.0895541Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-12-04T09:17:18.0897696Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-12-04T09:17:18.0900208Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-12-04T09:17:18.0902028Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-12-04T09:17:18.0903840Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-12-04T09:17:18.0906312Z * [new branch] gh/PaliC/25/head -> origin/gh/PaliC/25/head 2025-12-04T09:17:18.0908344Z * [new branch] gh/PaliC/25/next -> origin/gh/PaliC/25/next 2025-12-04T09:17:18.0910193Z * [new branch] gh/PaliC/25/orig -> origin/gh/PaliC/25/orig 2025-12-04T09:17:18.0912674Z * [new branch] gh/PaliC/26/head -> origin/gh/PaliC/26/head 2025-12-04T09:17:18.0914134Z * [new branch] gh/PaliC/26/next -> origin/gh/PaliC/26/next 2025-12-04T09:17:18.0916098Z * [new branch] gh/PaliC/26/orig -> origin/gh/PaliC/26/orig 2025-12-04T09:17:18.0918647Z * [new branch] gh/PaliC/27/next -> origin/gh/PaliC/27/next 2025-12-04T09:17:18.0921148Z * [new branch] gh/PaliC/28/head -> origin/gh/PaliC/28/head 2025-12-04T09:17:18.0922614Z * [new branch] gh/PaliC/28/next -> origin/gh/PaliC/28/next 2025-12-04T09:17:18.0924606Z * [new branch] gh/PaliC/28/orig -> origin/gh/PaliC/28/orig 2025-12-04T09:17:18.0927192Z * [new branch] gh/PaliC/29/head -> origin/gh/PaliC/29/head 2025-12-04T09:17:18.0928701Z * [new branch] gh/PaliC/29/next -> origin/gh/PaliC/29/next 2025-12-04T09:17:18.0930681Z * [new branch] gh/PaliC/29/orig -> origin/gh/PaliC/29/orig 2025-12-04T09:17:18.0933271Z * [new branch] gh/PaliC/30/head -> origin/gh/PaliC/30/head 2025-12-04T09:17:18.0934737Z * [new branch] gh/PaliC/30/next -> origin/gh/PaliC/30/next 2025-12-04T09:17:18.0936672Z * [new branch] gh/PaliC/30/orig -> origin/gh/PaliC/30/orig 2025-12-04T09:17:18.0939326Z * [new branch] gh/PaliC/31/head -> origin/gh/PaliC/31/head 2025-12-04T09:17:18.0941166Z * [new branch] gh/PaliC/31/next -> origin/gh/PaliC/31/next 2025-12-04T09:17:18.0943006Z * [new branch] gh/PaliC/31/orig -> origin/gh/PaliC/31/orig 2025-12-04T09:17:18.0946041Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-12-04T09:17:18.0948058Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-12-04T09:17:18.0949621Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-12-04T09:17:18.0952322Z * [new branch] gh/PaulZhang12/28/base -> origin/gh/PaulZhang12/28/base 2025-12-04T09:17:18.0954266Z * [new branch] gh/PaulZhang12/28/head -> origin/gh/PaulZhang12/28/head 2025-12-04T09:17:18.0956126Z * [new branch] gh/PaulZhang12/28/orig -> origin/gh/PaulZhang12/28/orig 2025-12-04T09:17:18.0959009Z * [new branch] gh/PaulZhang12/31/base -> origin/gh/PaulZhang12/31/base 2025-12-04T09:17:18.0961650Z * [new branch] gh/PaulZhang12/31/head -> origin/gh/PaulZhang12/31/head 2025-12-04T09:17:18.0963244Z * [new branch] gh/PaulZhang12/31/orig -> origin/gh/PaulZhang12/31/orig 2025-12-04T09:17:18.0965972Z * [new branch] gh/PaulZhang12/37/base -> origin/gh/PaulZhang12/37/base 2025-12-04T09:17:18.0967490Z * [new branch] gh/PaulZhang12/37/head -> origin/gh/PaulZhang12/37/head 2025-12-04T09:17:18.0969523Z * [new branch] gh/PaulZhang12/37/orig -> origin/gh/PaulZhang12/37/orig 2025-12-04T09:17:18.0972134Z * [new branch] gh/PaulZhang12/40/base -> origin/gh/PaulZhang12/40/base 2025-12-04T09:17:18.0974085Z * [new branch] gh/PaulZhang12/40/head -> origin/gh/PaulZhang12/40/head 2025-12-04T09:17:18.0976028Z * [new branch] gh/PaulZhang12/40/orig -> origin/gh/PaulZhang12/40/orig 2025-12-04T09:17:18.0978708Z * [new branch] gh/PaulZhang12/42/base -> origin/gh/PaulZhang12/42/base 2025-12-04T09:17:18.0980663Z * [new branch] gh/PaulZhang12/42/head -> origin/gh/PaulZhang12/42/head 2025-12-04T09:17:18.0983186Z * [new branch] gh/PaulZhang12/43/base -> origin/gh/PaulZhang12/43/base 2025-12-04T09:17:18.0985055Z * [new branch] gh/PaulZhang12/43/head -> origin/gh/PaulZhang12/43/head 2025-12-04T09:17:18.0986905Z * [new branch] gh/PaulZhang12/43/orig -> origin/gh/PaulZhang12/43/orig 2025-12-04T09:17:18.0989318Z * [new branch] gh/PaulZhang12/44/base -> origin/gh/PaulZhang12/44/base 2025-12-04T09:17:18.0991175Z * [new branch] gh/PaulZhang12/44/head -> origin/gh/PaulZhang12/44/head 2025-12-04T09:17:18.0993795Z * [new branch] gh/PaulZhang12/45/base -> origin/gh/PaulZhang12/45/base 2025-12-04T09:17:18.0995346Z * [new branch] gh/PaulZhang12/45/head -> origin/gh/PaulZhang12/45/head 2025-12-04T09:17:18.0997256Z * [new branch] gh/PaulZhang12/45/orig -> origin/gh/PaulZhang12/45/orig 2025-12-04T09:17:18.0999895Z * [new branch] gh/PaulZhang12/46/base -> origin/gh/PaulZhang12/46/base 2025-12-04T09:17:18.1001772Z * [new branch] gh/PaulZhang12/46/head -> origin/gh/PaulZhang12/46/head 2025-12-04T09:17:18.1003640Z * [new branch] gh/PaulZhang12/46/orig -> origin/gh/PaulZhang12/46/orig 2025-12-04T09:17:18.1006221Z * [new branch] gh/PaulZhang12/47/base -> origin/gh/PaulZhang12/47/base 2025-12-04T09:17:18.1008333Z * [new branch] gh/PaulZhang12/47/head -> origin/gh/PaulZhang12/47/head 2025-12-04T09:17:18.1011649Z * [new branch] gh/PaulZhang12/47/orig -> origin/gh/PaulZhang12/47/orig 2025-12-04T09:17:18.1013935Z * [new branch] gh/PaulZhang12/48/base -> origin/gh/PaulZhang12/48/base 2025-12-04T09:17:18.1015538Z * [new branch] gh/PaulZhang12/48/head -> origin/gh/PaulZhang12/48/head 2025-12-04T09:17:18.1017518Z * [new branch] gh/PaulZhang12/48/orig -> origin/gh/PaulZhang12/48/orig 2025-12-04T09:17:18.1020917Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-12-04T09:17:18.1022390Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-12-04T09:17:18.1025834Z * [new branch] gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base 2025-12-04T09:17:18.1027447Z * [new branch] gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head 2025-12-04T09:17:18.1030244Z * [new branch] gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base 2025-12-04T09:17:18.1032173Z * [new branch] gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head 2025-12-04T09:17:18.1034081Z * [new branch] gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig 2025-12-04T09:17:18.1036570Z * [new branch] gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base 2025-12-04T09:17:18.1038170Z * [new branch] gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head 2025-12-04T09:17:18.1040407Z * [new branch] gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig 2025-12-04T09:17:18.1042667Z * [new branch] gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base 2025-12-04T09:17:18.1044258Z * [new branch] gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head 2025-12-04T09:17:18.1046008Z * [new branch] gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig 2025-12-04T09:17:18.1048812Z * [new branch] gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base 2025-12-04T09:17:18.1050669Z * [new branch] gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head 2025-12-04T09:17:18.1052570Z * [new branch] gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig 2025-12-04T09:17:18.1055052Z * [new branch] gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base 2025-12-04T09:17:18.1056903Z * [new branch] gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head 2025-12-04T09:17:18.1058507Z * [new branch] gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig 2025-12-04T09:17:18.1061542Z * [new branch] gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base 2025-12-04T09:17:18.1063401Z * [new branch] gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head 2025-12-04T09:17:18.1065032Z * [new branch] gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig 2025-12-04T09:17:18.1067546Z * [new branch] gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base 2025-12-04T09:17:18.1069500Z * [new branch] gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head 2025-12-04T09:17:18.1071395Z * [new branch] gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig 2025-12-04T09:17:18.1073789Z * [new branch] gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base 2025-12-04T09:17:18.1075361Z * [new branch] gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head 2025-12-04T09:17:18.1077883Z * [new branch] gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base 2025-12-04T09:17:18.1079956Z * [new branch] gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head 2025-12-04T09:17:18.1081518Z * [new branch] gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig 2025-12-04T09:17:18.1084384Z * [new branch] gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base 2025-12-04T09:17:18.1086376Z * [new branch] gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head 2025-12-04T09:17:18.1087912Z * [new branch] gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig 2025-12-04T09:17:18.1090458Z * [new branch] gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base 2025-12-04T09:17:18.1092026Z * [new branch] gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head 2025-12-04T09:17:18.1094509Z * [new branch] gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base 2025-12-04T09:17:18.1096141Z * [new branch] gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head 2025-12-04T09:17:18.1098750Z * [new branch] gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base 2025-12-04T09:17:18.1100792Z * [new branch] gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head 2025-12-04T09:17:18.1104575Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-12-04T09:17:18.1107040Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-12-04T09:17:18.1109690Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-12-04T09:17:18.1112235Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-12-04T09:17:18.1115520Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-12-04T09:17:18.1117462Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-12-04T09:17:18.1120039Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-12-04T09:17:18.1121718Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-12-04T09:17:18.1124231Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-12-04T09:17:18.1126096Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-12-04T09:17:18.1128603Z * [new branch] gh/StrongerXi/73/base -> origin/gh/StrongerXi/73/base 2025-12-04T09:17:18.1130412Z * [new branch] gh/StrongerXi/73/head -> origin/gh/StrongerXi/73/head 2025-12-04T09:17:18.1132283Z * [new branch] gh/StrongerXi/73/orig -> origin/gh/StrongerXi/73/orig 2025-12-04T09:17:18.1135440Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-12-04T09:17:18.1137228Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-12-04T09:17:18.1139158Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-12-04T09:17:18.1141800Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-12-04T09:17:18.1143639Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-12-04T09:17:18.1145462Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-12-04T09:17:18.1148126Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-12-04T09:17:18.1149918Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-12-04T09:17:18.1151465Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-12-04T09:17:18.1154286Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-12-04T09:17:18.1156178Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-12-04T09:17:18.1158054Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-12-04T09:17:18.1160423Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-12-04T09:17:18.1162231Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-12-04T09:17:18.1164054Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-12-04T09:17:18.1166667Z * [new branch] gh/XilunWu/171/base -> origin/gh/XilunWu/171/base 2025-12-04T09:17:18.1168649Z * [new branch] gh/XilunWu/171/head -> origin/gh/XilunWu/171/head 2025-12-04T09:17:18.1170469Z * [new branch] gh/XilunWu/171/orig -> origin/gh/XilunWu/171/orig 2025-12-04T09:17:18.1172887Z * [new branch] gh/XilunWu/173/base -> origin/gh/XilunWu/173/base 2025-12-04T09:17:18.1174823Z * [new branch] gh/XilunWu/173/head -> origin/gh/XilunWu/173/head 2025-12-04T09:17:18.1176635Z * [new branch] gh/XilunWu/173/orig -> origin/gh/XilunWu/173/orig 2025-12-04T09:17:18.1179200Z * [new branch] gh/XilunWu/175/base -> origin/gh/XilunWu/175/base 2025-12-04T09:17:18.1181228Z * [new branch] gh/XilunWu/175/head -> origin/gh/XilunWu/175/head 2025-12-04T09:17:18.1183059Z * [new branch] gh/XilunWu/175/orig -> origin/gh/XilunWu/175/orig 2025-12-04T09:17:18.1185623Z * [new branch] gh/XilunWu/176/base -> origin/gh/XilunWu/176/base 2025-12-04T09:17:18.1187490Z * [new branch] gh/XilunWu/176/head -> origin/gh/XilunWu/176/head 2025-12-04T09:17:18.1189494Z * [new branch] gh/XilunWu/176/orig -> origin/gh/XilunWu/176/orig 2025-12-04T09:17:18.1192481Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-12-04T09:17:18.1194305Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-12-04T09:17:18.1195905Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-12-04T09:17:18.1198691Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-12-04T09:17:18.1200595Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-12-04T09:17:18.1202564Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-12-04T09:17:18.1205051Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-12-04T09:17:18.1206878Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-12-04T09:17:18.1208660Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-12-04T09:17:18.1211430Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-12-04T09:17:18.1213299Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-12-04T09:17:18.1215166Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-12-04T09:17:18.1217692Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-12-04T09:17:18.1219701Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-12-04T09:17:18.1221496Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-12-04T09:17:18.1224063Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-12-04T09:17:18.1226061Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-12-04T09:17:18.1227919Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-12-04T09:17:18.1230447Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-12-04T09:17:18.1232273Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-12-04T09:17:18.1234238Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-12-04T09:17:18.1236798Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-12-04T09:17:18.1238613Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-12-04T09:17:18.1240440Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-12-04T09:17:18.1242993Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-12-04T09:17:18.1245083Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-12-04T09:17:18.1246501Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-12-04T09:17:18.1249122Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-12-04T09:17:18.1250994Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-12-04T09:17:18.1252795Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-12-04T09:17:18.1255298Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-12-04T09:17:18.1257102Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-12-04T09:17:18.1259036Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-12-04T09:17:18.1261890Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-12-04T09:17:18.1263405Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-12-04T09:17:18.1265392Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-12-04T09:17:18.1268084Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-12-04T09:17:18.1269709Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-12-04T09:17:18.1272339Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-12-04T09:17:18.1274199Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-12-04T09:17:18.1275772Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-12-04T09:17:18.1278589Z * [new branch] gh/XuehaiPan/390/base -> origin/gh/XuehaiPan/390/base 2025-12-04T09:17:18.1280462Z * [new branch] gh/XuehaiPan/390/head -> origin/gh/XuehaiPan/390/head 2025-12-04T09:17:18.1282258Z * [new branch] gh/XuehaiPan/390/orig -> origin/gh/XuehaiPan/390/orig 2025-12-04T09:17:18.1284830Z * [new branch] gh/XuehaiPan/391/base -> origin/gh/XuehaiPan/391/base 2025-12-04T09:17:18.1286384Z * [new branch] gh/XuehaiPan/391/head -> origin/gh/XuehaiPan/391/head 2025-12-04T09:17:18.1288411Z * [new branch] gh/XuehaiPan/391/orig -> origin/gh/XuehaiPan/391/orig 2025-12-04T09:17:18.1290879Z * [new branch] gh/XuehaiPan/392/base -> origin/gh/XuehaiPan/392/base 2025-12-04T09:17:18.1292701Z * [new branch] gh/XuehaiPan/392/head -> origin/gh/XuehaiPan/392/head 2025-12-04T09:17:18.1294588Z * [new branch] gh/XuehaiPan/392/orig -> origin/gh/XuehaiPan/392/orig 2025-12-04T09:17:18.1297652Z * [new branch] gh/XuehaiPan/394/base -> origin/gh/XuehaiPan/394/base 2025-12-04T09:17:18.1299668Z * [new branch] gh/XuehaiPan/394/head -> origin/gh/XuehaiPan/394/head 2025-12-04T09:17:18.1301471Z * [new branch] gh/XuehaiPan/394/orig -> origin/gh/XuehaiPan/394/orig 2025-12-04T09:17:18.1304098Z * [new branch] gh/XuehaiPan/397/base -> origin/gh/XuehaiPan/397/base 2025-12-04T09:17:18.1305947Z * [new branch] gh/XuehaiPan/397/head -> origin/gh/XuehaiPan/397/head 2025-12-04T09:17:18.1308174Z * [new branch] gh/XuehaiPan/397/orig -> origin/gh/XuehaiPan/397/orig 2025-12-04T09:17:18.1310633Z * [new branch] gh/XuehaiPan/398/base -> origin/gh/XuehaiPan/398/base 2025-12-04T09:17:18.1312180Z * [new branch] gh/XuehaiPan/398/head -> origin/gh/XuehaiPan/398/head 2025-12-04T09:17:18.1314172Z * [new branch] gh/XuehaiPan/398/orig -> origin/gh/XuehaiPan/398/orig 2025-12-04T09:17:18.1316760Z * [new branch] gh/XuehaiPan/399/base -> origin/gh/XuehaiPan/399/base 2025-12-04T09:17:18.1318589Z * [new branch] gh/XuehaiPan/399/head -> origin/gh/XuehaiPan/399/head 2025-12-04T09:17:18.1320395Z * [new branch] gh/XuehaiPan/399/orig -> origin/gh/XuehaiPan/399/orig 2025-12-04T09:17:18.1323050Z * [new branch] gh/XuehaiPan/400/base -> origin/gh/XuehaiPan/400/base 2025-12-04T09:17:18.1324918Z * [new branch] gh/XuehaiPan/400/head -> origin/gh/XuehaiPan/400/head 2025-12-04T09:17:18.1326733Z * [new branch] gh/XuehaiPan/400/orig -> origin/gh/XuehaiPan/400/orig 2025-12-04T09:17:18.1329847Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-12-04T09:17:18.1331431Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-12-04T09:17:18.1333462Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-12-04T09:17:18.1336307Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-12-04T09:17:18.1337734Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-12-04T09:17:18.1340667Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-12-04T09:17:18.1342279Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-12-04T09:17:18.1345083Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-12-04T09:17:18.1346962Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-12-04T09:17:18.1349429Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-12-04T09:17:18.1351246Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-12-04T09:17:18.1353819Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-12-04T09:17:18.1355688Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-12-04T09:17:18.1358112Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-12-04T09:17:18.1359688Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-12-04T09:17:18.1362294Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-12-04T09:17:18.1364080Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-12-04T09:17:18.1366012Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-12-04T09:17:18.1369259Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-12-04T09:17:18.1371097Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-12-04T09:17:18.1373473Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-12-04T09:17:18.1375339Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-12-04T09:17:18.1377774Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-12-04T09:17:18.1379722Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-12-04T09:17:18.1381591Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-12-04T09:17:18.1384730Z * [new branch] gh/albanD/4/base -> origin/gh/albanD/4/base 2025-12-04T09:17:18.1386224Z * [new branch] gh/albanD/4/head -> origin/gh/albanD/4/head 2025-12-04T09:17:18.1388192Z * [new branch] gh/albanD/4/orig -> origin/gh/albanD/4/orig 2025-12-04T09:17:18.1391070Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-12-04T09:17:18.1394138Z * [new branch] gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base 2025-12-04T09:17:18.1395676Z * [new branch] gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head 2025-12-04T09:17:18.1397701Z * [new branch] gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig 2025-12-04T09:17:18.1400214Z * [new branch] gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base 2025-12-04T09:17:18.1402173Z * [new branch] gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head 2025-12-04T09:17:18.1403638Z * [new branch] gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig 2025-12-04T09:17:18.1406379Z * [new branch] gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base 2025-12-04T09:17:18.1408466Z * [new branch] gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head 2025-12-04T09:17:18.1413309Z * [new branch] gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig 2025-12-04T09:17:18.1416020Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-12-04T09:17:18.1417934Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-12-04T09:17:18.1420064Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-12-04T09:17:18.1423253Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-12-04T09:17:18.1425079Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-12-04T09:17:18.1427027Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-12-04T09:17:18.1429867Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-12-04T09:17:18.1431693Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-12-04T09:17:18.1433627Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-12-04T09:17:18.1436679Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-12-04T09:17:18.1438898Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-12-04T09:17:18.1441509Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-12-04T09:17:18.1443524Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-12-04T09:17:18.1446040Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-12-04T09:17:18.1447964Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-12-04T09:17:18.1450252Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-12-04T09:17:18.1452911Z * [new branch] gh/andyanwang/42/base -> origin/gh/andyanwang/42/base 2025-12-04T09:17:18.1454427Z * [new branch] gh/andyanwang/42/head -> origin/gh/andyanwang/42/head 2025-12-04T09:17:18.1456534Z * [new branch] gh/andyanwang/42/orig -> origin/gh/andyanwang/42/orig 2025-12-04T09:17:18.1459382Z * [new branch] gh/andyanwang/45/base -> origin/gh/andyanwang/45/base 2025-12-04T09:17:18.1461236Z * [new branch] gh/andyanwang/45/head -> origin/gh/andyanwang/45/head 2025-12-04T09:17:18.1463026Z * [new branch] gh/andyanwang/45/orig -> origin/gh/andyanwang/45/orig 2025-12-04T09:17:18.1466276Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-12-04T09:17:18.1467743Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-12-04T09:17:18.1470456Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-12-04T09:17:18.1472411Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-12-04T09:17:18.1474266Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-12-04T09:17:18.1477014Z * [new branch] gh/angelayi/116/base -> origin/gh/angelayi/116/base 2025-12-04T09:17:18.1478830Z * [new branch] gh/angelayi/116/head -> origin/gh/angelayi/116/head 2025-12-04T09:17:18.1480637Z * [new branch] gh/angelayi/116/orig -> origin/gh/angelayi/116/orig 2025-12-04T09:17:18.1483455Z * [new branch] gh/angelayi/122/base -> origin/gh/angelayi/122/base 2025-12-04T09:17:18.1484966Z * [new branch] gh/angelayi/122/head -> origin/gh/angelayi/122/head 2025-12-04T09:17:18.1486932Z * [new branch] gh/angelayi/122/orig -> origin/gh/angelayi/122/orig 2025-12-04T09:17:18.1489577Z * [new branch] gh/angelayi/124/base -> origin/gh/angelayi/124/base 2025-12-04T09:17:18.1491577Z * [new branch] gh/angelayi/124/head -> origin/gh/angelayi/124/head 2025-12-04T09:17:18.1493352Z * [new branch] gh/angelayi/124/orig -> origin/gh/angelayi/124/orig 2025-12-04T09:17:18.1495821Z * [new branch] gh/angelayi/128/base -> origin/gh/angelayi/128/base 2025-12-04T09:17:18.1497502Z * [new branch] gh/angelayi/128/head -> origin/gh/angelayi/128/head 2025-12-04T09:17:18.1499963Z * [new branch] gh/angelayi/128/orig -> origin/gh/angelayi/128/orig 2025-12-04T09:17:18.1502659Z * [new branch] gh/angelayi/131/base -> origin/gh/angelayi/131/base 2025-12-04T09:17:18.1503818Z * [new branch] gh/angelayi/131/head -> origin/gh/angelayi/131/head 2025-12-04T09:17:18.1505881Z * [new branch] gh/angelayi/131/orig -> origin/gh/angelayi/131/orig 2025-12-04T09:17:18.1508956Z * [new branch] gh/angelayi/132/base -> origin/gh/angelayi/132/base 2025-12-04T09:17:18.1510825Z * [new branch] gh/angelayi/132/head -> origin/gh/angelayi/132/head 2025-12-04T09:17:18.1512760Z * [new branch] gh/angelayi/132/orig -> origin/gh/angelayi/132/orig 2025-12-04T09:17:18.1515274Z * [new branch] gh/angelayi/133/base -> origin/gh/angelayi/133/base 2025-12-04T09:17:18.1517274Z * [new branch] gh/angelayi/133/head -> origin/gh/angelayi/133/head 2025-12-04T09:17:18.1519124Z * [new branch] gh/angelayi/133/orig -> origin/gh/angelayi/133/orig 2025-12-04T09:17:18.1521908Z * [new branch] gh/angelayi/134/base -> origin/gh/angelayi/134/base 2025-12-04T09:17:18.1523863Z * [new branch] gh/angelayi/134/head -> origin/gh/angelayi/134/head 2025-12-04T09:17:18.1525704Z * [new branch] gh/angelayi/134/orig -> origin/gh/angelayi/134/orig 2025-12-04T09:17:18.1528372Z * [new branch] gh/angelayi/135/base -> origin/gh/angelayi/135/base 2025-12-04T09:17:18.1530287Z * [new branch] gh/angelayi/135/head -> origin/gh/angelayi/135/head 2025-12-04T09:17:18.1532087Z * [new branch] gh/angelayi/135/orig -> origin/gh/angelayi/135/orig 2025-12-04T09:17:18.1534612Z * [new branch] gh/angelayi/136/base -> origin/gh/angelayi/136/base 2025-12-04T09:17:18.1536198Z * [new branch] gh/angelayi/136/head -> origin/gh/angelayi/136/head 2025-12-04T09:17:18.1538230Z * [new branch] gh/angelayi/136/orig -> origin/gh/angelayi/136/orig 2025-12-04T09:17:18.1540965Z * [new branch] gh/angelayi/137/base -> origin/gh/angelayi/137/base 2025-12-04T09:17:18.1542773Z * [new branch] gh/angelayi/137/head -> origin/gh/angelayi/137/head 2025-12-04T09:17:18.1544851Z * [new branch] gh/angelayi/137/orig -> origin/gh/angelayi/137/orig 2025-12-04T09:17:18.1547239Z * [new branch] gh/angelayi/138/base -> origin/gh/angelayi/138/base 2025-12-04T09:17:18.1548790Z * [new branch] gh/angelayi/138/head -> origin/gh/angelayi/138/head 2025-12-04T09:17:18.1550791Z * [new branch] gh/angelayi/138/orig -> origin/gh/angelayi/138/orig 2025-12-04T09:17:18.1553259Z * [new branch] gh/angelayi/139/base -> origin/gh/angelayi/139/base 2025-12-04T09:17:18.1555116Z * [new branch] gh/angelayi/139/head -> origin/gh/angelayi/139/head 2025-12-04T09:17:18.1557001Z * [new branch] gh/angelayi/139/orig -> origin/gh/angelayi/139/orig 2025-12-04T09:17:18.1559687Z * [new branch] gh/angelayi/140/base -> origin/gh/angelayi/140/base 2025-12-04T09:17:18.1561538Z * [new branch] gh/angelayi/140/head -> origin/gh/angelayi/140/head 2025-12-04T09:17:18.1563435Z * [new branch] gh/angelayi/140/orig -> origin/gh/angelayi/140/orig 2025-12-04T09:17:18.1566700Z * [new branch] gh/angelayi/141/base -> origin/gh/angelayi/141/base 2025-12-04T09:17:18.1568254Z * [new branch] gh/angelayi/141/head -> origin/gh/angelayi/141/head 2025-12-04T09:17:18.1570294Z * [new branch] gh/angelayi/141/orig -> origin/gh/angelayi/141/orig 2025-12-04T09:17:18.1572998Z * [new branch] gh/angelayi/142/base -> origin/gh/angelayi/142/base 2025-12-04T09:17:18.1574793Z * [new branch] gh/angelayi/142/head -> origin/gh/angelayi/142/head 2025-12-04T09:17:18.1576366Z * [new branch] gh/angelayi/142/orig -> origin/gh/angelayi/142/orig 2025-12-04T09:17:18.1579257Z * [new branch] gh/angelayi/143/base -> origin/gh/angelayi/143/base 2025-12-04T09:17:18.1581219Z * [new branch] gh/angelayi/143/head -> origin/gh/angelayi/143/head 2025-12-04T09:17:18.1583020Z * [new branch] gh/angelayi/143/orig -> origin/gh/angelayi/143/orig 2025-12-04T09:17:18.1585642Z * [new branch] gh/angelayi/144/base -> origin/gh/angelayi/144/base 2025-12-04T09:17:18.1587631Z * [new branch] gh/angelayi/144/head -> origin/gh/angelayi/144/head 2025-12-04T09:17:18.1589476Z * [new branch] gh/angelayi/144/orig -> origin/gh/angelayi/144/orig 2025-12-04T09:17:18.1592832Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-12-04T09:17:18.1594675Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-12-04T09:17:18.1596199Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-12-04T09:17:18.1599153Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-12-04T09:17:18.1601014Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-12-04T09:17:18.1602882Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-12-04T09:17:18.1605411Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-12-04T09:17:18.1607281Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-12-04T09:17:18.1609352Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-12-04T09:17:18.1639183Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-12-04T09:17:18.1639967Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-12-04T09:17:18.1640667Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-12-04T09:17:18.1641411Z * [new branch] gh/anijain2305/870/base -> origin/gh/anijain2305/870/base 2025-12-04T09:17:18.1642025Z * [new branch] gh/anijain2305/870/head -> origin/gh/anijain2305/870/head 2025-12-04T09:17:18.1642610Z * [new branch] gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig 2025-12-04T09:17:18.1643423Z * [new branch] gh/anijain2305/873/base -> origin/gh/anijain2305/873/base 2025-12-04T09:17:18.1644195Z * [new branch] gh/anijain2305/873/head -> origin/gh/anijain2305/873/head 2025-12-04T09:17:18.1644799Z * [new branch] gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig 2025-12-04T09:17:18.1645551Z * [new branch] gh/anijain2305/894/base -> origin/gh/anijain2305/894/base 2025-12-04T09:17:18.1646185Z * [new branch] gh/anijain2305/894/head -> origin/gh/anijain2305/894/head 2025-12-04T09:17:18.1646825Z * [new branch] gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig 2025-12-04T09:17:18.1647560Z * [new branch] gh/anijain2305/895/base -> origin/gh/anijain2305/895/base 2025-12-04T09:17:18.1648147Z * [new branch] gh/anijain2305/895/head -> origin/gh/anijain2305/895/head 2025-12-04T09:17:18.1649022Z * [new branch] gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig 2025-12-04T09:17:18.1649766Z * [new branch] gh/anijain2305/910/base -> origin/gh/anijain2305/910/base 2025-12-04T09:17:18.1650465Z * [new branch] gh/anijain2305/910/head -> origin/gh/anijain2305/910/head 2025-12-04T09:17:18.1651247Z * [new branch] gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig 2025-12-04T09:17:18.1652001Z * [new branch] gh/anijain2305/919/base -> origin/gh/anijain2305/919/base 2025-12-04T09:17:18.1652602Z * [new branch] gh/anijain2305/919/head -> origin/gh/anijain2305/919/head 2025-12-04T09:17:18.1653308Z * [new branch] gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig 2025-12-04T09:17:18.1656064Z * [new branch] gh/anijain2305/922/base -> origin/gh/anijain2305/922/base 2025-12-04T09:17:18.1657574Z * [new branch] gh/anijain2305/922/head -> origin/gh/anijain2305/922/head 2025-12-04T09:17:18.1659759Z * [new branch] gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig 2025-12-04T09:17:18.1662330Z * [new branch] gh/anijain2305/932/base -> origin/gh/anijain2305/932/base 2025-12-04T09:17:18.1664314Z * [new branch] gh/anijain2305/932/head -> origin/gh/anijain2305/932/head 2025-12-04T09:17:18.1666270Z * [new branch] gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig 2025-12-04T09:17:18.1668803Z * [new branch] gh/anijain2305/940/base -> origin/gh/anijain2305/940/base 2025-12-04T09:17:18.1670625Z * [new branch] gh/anijain2305/940/head -> origin/gh/anijain2305/940/head 2025-12-04T09:17:18.1672456Z * [new branch] gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig 2025-12-04T09:17:18.1675104Z * [new branch] gh/anijain2305/941/base -> origin/gh/anijain2305/941/base 2025-12-04T09:17:18.1676939Z * [new branch] gh/anijain2305/941/head -> origin/gh/anijain2305/941/head 2025-12-04T09:17:18.1678740Z * [new branch] gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig 2025-12-04T09:17:18.1681302Z * [new branch] gh/anijain2305/942/base -> origin/gh/anijain2305/942/base 2025-12-04T09:17:18.1683176Z * [new branch] gh/anijain2305/942/head -> origin/gh/anijain2305/942/head 2025-12-04T09:17:18.1685081Z * [new branch] gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig 2025-12-04T09:17:18.1687647Z * [new branch] gh/anijain2305/943/base -> origin/gh/anijain2305/943/base 2025-12-04T09:17:18.1689489Z * [new branch] gh/anijain2305/943/head -> origin/gh/anijain2305/943/head 2025-12-04T09:17:18.1691281Z * [new branch] gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig 2025-12-04T09:17:18.1694515Z * [new branch] gh/anijain2305/944/base -> origin/gh/anijain2305/944/base 2025-12-04T09:17:18.1696352Z * [new branch] gh/anijain2305/944/head -> origin/gh/anijain2305/944/head 2025-12-04T09:17:18.1698623Z * [new branch] gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig 2025-12-04T09:17:18.1701436Z * [new branch] gh/anijain2305/945/base -> origin/gh/anijain2305/945/base 2025-12-04T09:17:18.1703285Z * [new branch] gh/anijain2305/945/head -> origin/gh/anijain2305/945/head 2025-12-04T09:17:18.1705223Z * [new branch] gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig 2025-12-04T09:17:18.1708149Z * [new branch] gh/anijain2305/946/base -> origin/gh/anijain2305/946/base 2025-12-04T09:17:18.1710068Z * [new branch] gh/anijain2305/946/head -> origin/gh/anijain2305/946/head 2025-12-04T09:17:18.1712017Z * [new branch] gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig 2025-12-04T09:17:18.1714628Z * [new branch] gh/anijain2305/947/base -> origin/gh/anijain2305/947/base 2025-12-04T09:17:18.1716050Z * [new branch] gh/anijain2305/947/head -> origin/gh/anijain2305/947/head 2025-12-04T09:17:18.1718025Z * [new branch] gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig 2025-12-04T09:17:18.1720885Z * [new branch] gh/anijain2305/948/base -> origin/gh/anijain2305/948/base 2025-12-04T09:17:18.1722458Z * [new branch] gh/anijain2305/948/head -> origin/gh/anijain2305/948/head 2025-12-04T09:17:18.1724395Z * [new branch] gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig 2025-12-04T09:17:18.1727219Z * [new branch] gh/anijain2305/949/base -> origin/gh/anijain2305/949/base 2025-12-04T09:17:18.1729124Z * [new branch] gh/anijain2305/949/head -> origin/gh/anijain2305/949/head 2025-12-04T09:17:18.1730986Z * [new branch] gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig 2025-12-04T09:17:18.1733572Z * [new branch] gh/anijain2305/950/base -> origin/gh/anijain2305/950/base 2025-12-04T09:17:18.1735430Z * [new branch] gh/anijain2305/950/head -> origin/gh/anijain2305/950/head 2025-12-04T09:17:18.1737029Z * [new branch] gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig 2025-12-04T09:17:18.1740077Z * [new branch] gh/anijain2305/951/base -> origin/gh/anijain2305/951/base 2025-12-04T09:17:18.1741601Z * [new branch] gh/anijain2305/951/head -> origin/gh/anijain2305/951/head 2025-12-04T09:17:18.1743673Z * [new branch] gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig 2025-12-04T09:17:18.1746396Z * [new branch] gh/anijain2305/952/base -> origin/gh/anijain2305/952/base 2025-12-04T09:17:18.1748230Z * [new branch] gh/anijain2305/952/head -> origin/gh/anijain2305/952/head 2025-12-04T09:17:18.1750044Z * [new branch] gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig 2025-12-04T09:17:18.1752632Z * [new branch] gh/anijain2305/953/base -> origin/gh/anijain2305/953/base 2025-12-04T09:17:18.1754494Z * [new branch] gh/anijain2305/953/head -> origin/gh/anijain2305/953/head 2025-12-04T09:17:18.1756329Z * [new branch] gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig 2025-12-04T09:17:18.1758907Z * [new branch] gh/anijain2305/954/base -> origin/gh/anijain2305/954/base 2025-12-04T09:17:18.1760910Z * [new branch] gh/anijain2305/954/head -> origin/gh/anijain2305/954/head 2025-12-04T09:17:18.1762732Z * [new branch] gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig 2025-12-04T09:17:18.1765435Z * [new branch] gh/anijain2305/955/base -> origin/gh/anijain2305/955/base 2025-12-04T09:17:18.1767282Z * [new branch] gh/anijain2305/955/head -> origin/gh/anijain2305/955/head 2025-12-04T09:17:18.1769122Z * [new branch] gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig 2025-12-04T09:17:18.1771943Z * [new branch] gh/anijain2305/956/base -> origin/gh/anijain2305/956/base 2025-12-04T09:17:18.1773758Z * [new branch] gh/anijain2305/956/head -> origin/gh/anijain2305/956/head 2025-12-04T09:17:18.1775578Z * [new branch] gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig 2025-12-04T09:17:18.1778596Z * [new branch] gh/anijain2305/957/base -> origin/gh/anijain2305/957/base 2025-12-04T09:17:18.1780641Z * [new branch] gh/anijain2305/957/head -> origin/gh/anijain2305/957/head 2025-12-04T09:17:18.1782472Z * [new branch] gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig 2025-12-04T09:17:18.1785639Z * [new branch] gh/anijain2305/958/base -> origin/gh/anijain2305/958/base 2025-12-04T09:17:18.1787642Z * [new branch] gh/anijain2305/958/head -> origin/gh/anijain2305/958/head 2025-12-04T09:17:18.1789126Z * [new branch] gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig 2025-12-04T09:17:18.1791891Z * [new branch] gh/anijain2305/959/base -> origin/gh/anijain2305/959/base 2025-12-04T09:17:18.1793617Z * [new branch] gh/anijain2305/959/head -> origin/gh/anijain2305/959/head 2025-12-04T09:17:18.1795561Z * [new branch] gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig 2025-12-04T09:17:18.1798357Z * [new branch] gh/anijain2305/960/base -> origin/gh/anijain2305/960/base 2025-12-04T09:17:18.1800247Z * [new branch] gh/anijain2305/960/head -> origin/gh/anijain2305/960/head 2025-12-04T09:17:18.1802113Z * [new branch] gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig 2025-12-04T09:17:18.1804899Z * [new branch] gh/anijain2305/961/base -> origin/gh/anijain2305/961/base 2025-12-04T09:17:18.1806703Z * [new branch] gh/anijain2305/961/head -> origin/gh/anijain2305/961/head 2025-12-04T09:17:18.1808467Z * [new branch] gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig 2025-12-04T09:17:18.1813311Z * [new branch] gh/anijain2305/962/base -> origin/gh/anijain2305/962/base 2025-12-04T09:17:18.1814695Z * [new branch] gh/anijain2305/962/head -> origin/gh/anijain2305/962/head 2025-12-04T09:17:18.1816812Z * [new branch] gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig 2025-12-04T09:17:18.1819892Z * [new branch] gh/anijain2305/963/base -> origin/gh/anijain2305/963/base 2025-12-04T09:17:18.1821648Z * [new branch] gh/anijain2305/963/head -> origin/gh/anijain2305/963/head 2025-12-04T09:17:18.1823835Z * [new branch] gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig 2025-12-04T09:17:18.1826543Z * [new branch] gh/anijain2305/964/base -> origin/gh/anijain2305/964/base 2025-12-04T09:17:18.1827981Z * [new branch] gh/anijain2305/964/head -> origin/gh/anijain2305/964/head 2025-12-04T09:17:18.1830059Z * [new branch] gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig 2025-12-04T09:17:18.1832754Z * [new branch] gh/anijain2305/965/base -> origin/gh/anijain2305/965/base 2025-12-04T09:17:18.1835011Z * [new branch] gh/anijain2305/965/head -> origin/gh/anijain2305/965/head 2025-12-04T09:17:18.1837851Z * [new branch] gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig 2025-12-04T09:17:18.1841353Z * [new branch] gh/anijain2305/966/base -> origin/gh/anijain2305/966/base 2025-12-04T09:17:18.1844178Z * [new branch] gh/anijain2305/966/head -> origin/gh/anijain2305/966/head 2025-12-04T09:17:18.1846822Z * [new branch] gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig 2025-12-04T09:17:18.1850325Z * [new branch] gh/anijain2305/967/base -> origin/gh/anijain2305/967/base 2025-12-04T09:17:18.1852820Z * [new branch] gh/anijain2305/967/head -> origin/gh/anijain2305/967/head 2025-12-04T09:17:18.1855485Z * [new branch] gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig 2025-12-04T09:17:18.1858836Z * [new branch] gh/anijain2305/968/base -> origin/gh/anijain2305/968/base 2025-12-04T09:17:18.1861508Z * [new branch] gh/anijain2305/968/head -> origin/gh/anijain2305/968/head 2025-12-04T09:17:18.1863882Z * [new branch] gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig 2025-12-04T09:17:18.1867516Z * [new branch] gh/anijain2305/969/base -> origin/gh/anijain2305/969/base 2025-12-04T09:17:18.1869989Z * [new branch] gh/anijain2305/969/head -> origin/gh/anijain2305/969/head 2025-12-04T09:17:18.1873018Z * [new branch] gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig 2025-12-04T09:17:18.1876212Z * [new branch] gh/anijain2305/970/base -> origin/gh/anijain2305/970/base 2025-12-04T09:17:18.1877654Z * [new branch] gh/anijain2305/970/head -> origin/gh/anijain2305/970/head 2025-12-04T09:17:18.1879591Z * [new branch] gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig 2025-12-04T09:17:18.1883055Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-12-04T09:17:18.1884848Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-12-04T09:17:18.1886904Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-12-04T09:17:18.1890246Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-12-04T09:17:18.1891740Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-12-04T09:17:18.1894337Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-12-04T09:17:18.1895875Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-12-04T09:17:18.1898484Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-12-04T09:17:18.1900369Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-12-04T09:17:18.1902817Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-12-04T09:17:18.1904274Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-12-04T09:17:18.1907003Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-12-04T09:17:18.1908662Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-12-04T09:17:18.1911792Z * [new branch] gh/anshul-si/53/base -> origin/gh/anshul-si/53/base 2025-12-04T09:17:18.1913516Z * [new branch] gh/anshul-si/53/head -> origin/gh/anshul-si/53/head 2025-12-04T09:17:18.1916339Z * [new branch] gh/anshul-si/58/base -> origin/gh/anshul-si/58/base 2025-12-04T09:17:18.1917809Z * [new branch] gh/anshul-si/58/head -> origin/gh/anshul-si/58/head 2025-12-04T09:17:18.1920530Z * [new branch] gh/anshul-si/66/base -> origin/gh/anshul-si/66/base 2025-12-04T09:17:18.1922238Z * [new branch] gh/anshul-si/66/head -> origin/gh/anshul-si/66/head 2025-12-04T09:17:18.1924261Z * [new branch] gh/anshul-si/66/orig -> origin/gh/anshul-si/66/orig 2025-12-04T09:17:18.1926662Z * [new branch] gh/anshul-si/67/base -> origin/gh/anshul-si/67/base 2025-12-04T09:17:18.1928144Z * [new branch] gh/anshul-si/67/head -> origin/gh/anshul-si/67/head 2025-12-04T09:17:18.1930198Z * [new branch] gh/anshul-si/67/orig -> origin/gh/anshul-si/67/orig 2025-12-04T09:17:18.1933004Z * [new branch] gh/anshul-si/68/base -> origin/gh/anshul-si/68/base 2025-12-04T09:17:18.1934716Z * [new branch] gh/anshul-si/68/head -> origin/gh/anshul-si/68/head 2025-12-04T09:17:18.1936755Z * [new branch] gh/anshul-si/68/orig -> origin/gh/anshul-si/68/orig 2025-12-04T09:17:18.1939652Z * [new branch] gh/anshul-si/69/base -> origin/gh/anshul-si/69/base 2025-12-04T09:17:18.1941112Z * [new branch] gh/anshul-si/69/head -> origin/gh/anshul-si/69/head 2025-12-04T09:17:18.1943232Z * [new branch] gh/anshul-si/69/orig -> origin/gh/anshul-si/69/orig 2025-12-04T09:17:18.1945873Z * [new branch] gh/anshul-si/70/base -> origin/gh/anshul-si/70/base 2025-12-04T09:17:18.1947340Z * [new branch] gh/anshul-si/70/head -> origin/gh/anshul-si/70/head 2025-12-04T09:17:18.1949678Z * [new branch] gh/anshul-si/70/orig -> origin/gh/anshul-si/70/orig 2025-12-04T09:17:18.1952085Z * [new branch] gh/anshul-si/71/base -> origin/gh/anshul-si/71/base 2025-12-04T09:17:18.1953805Z * [new branch] gh/anshul-si/71/head -> origin/gh/anshul-si/71/head 2025-12-04T09:17:18.1955515Z * [new branch] gh/anshul-si/71/orig -> origin/gh/anshul-si/71/orig 2025-12-04T09:17:18.1958313Z * [new branch] gh/anshul-si/72/base -> origin/gh/anshul-si/72/base 2025-12-04T09:17:18.1960418Z * [new branch] gh/anshul-si/72/head -> origin/gh/anshul-si/72/head 2025-12-04T09:17:18.1961899Z * [new branch] gh/anshul-si/72/orig -> origin/gh/anshul-si/72/orig 2025-12-04T09:17:18.1964647Z * [new branch] gh/anshul-si/73/base -> origin/gh/anshul-si/73/base 2025-12-04T09:17:18.1966667Z * [new branch] gh/anshul-si/73/head -> origin/gh/anshul-si/73/head 2025-12-04T09:17:18.1968138Z * [new branch] gh/anshul-si/73/orig -> origin/gh/anshul-si/73/orig 2025-12-04T09:17:18.1971573Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-12-04T09:17:18.1973303Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-12-04T09:17:18.1976223Z * [new branch] gh/aorenste/134/base -> origin/gh/aorenste/134/base 2025-12-04T09:17:18.1978383Z * [new branch] gh/aorenste/134/head -> origin/gh/aorenste/134/head 2025-12-04T09:17:18.1980507Z * [new branch] gh/aorenste/134/orig -> origin/gh/aorenste/134/orig 2025-12-04T09:17:18.1983123Z * [new branch] gh/aorenste/139/base -> origin/gh/aorenste/139/base 2025-12-04T09:17:18.1984846Z * [new branch] gh/aorenste/139/head -> origin/gh/aorenste/139/head 2025-12-04T09:17:18.1986913Z * [new branch] gh/aorenste/139/orig -> origin/gh/aorenste/139/orig 2025-12-04T09:17:18.1989453Z * [new branch] gh/aorenste/141/base -> origin/gh/aorenste/141/base 2025-12-04T09:17:18.1991007Z * [new branch] gh/aorenste/141/head -> origin/gh/aorenste/141/head 2025-12-04T09:17:18.1994021Z * [new branch] gh/aorenste/145/base -> origin/gh/aorenste/145/base 2025-12-04T09:17:18.1995962Z * [new branch] gh/aorenste/145/head -> origin/gh/aorenste/145/head 2025-12-04T09:17:18.1997977Z * [new branch] gh/aorenste/145/orig -> origin/gh/aorenste/145/orig 2025-12-04T09:17:18.2000543Z * [new branch] gh/aorenste/146/base -> origin/gh/aorenste/146/base 2025-12-04T09:17:18.2002500Z * [new branch] gh/aorenste/146/head -> origin/gh/aorenste/146/head 2025-12-04T09:17:18.2004502Z * [new branch] gh/aorenste/146/orig -> origin/gh/aorenste/146/orig 2025-12-04T09:17:18.2007066Z * [new branch] gh/aorenste/147/base -> origin/gh/aorenste/147/base 2025-12-04T09:17:18.2008807Z * [new branch] gh/aorenste/147/head -> origin/gh/aorenste/147/head 2025-12-04T09:17:18.2011095Z * [new branch] gh/aorenste/147/orig -> origin/gh/aorenste/147/orig 2025-12-04T09:17:18.2013710Z * [new branch] gh/aorenste/148/base -> origin/gh/aorenste/148/base 2025-12-04T09:17:18.2015231Z * [new branch] gh/aorenste/148/head -> origin/gh/aorenste/148/head 2025-12-04T09:17:18.2017340Z * [new branch] gh/aorenste/148/orig -> origin/gh/aorenste/148/orig 2025-12-04T09:17:18.2020133Z * [new branch] gh/aorenste/149/base -> origin/gh/aorenste/149/base 2025-12-04T09:17:18.2021556Z * [new branch] gh/aorenste/149/head -> origin/gh/aorenste/149/head 2025-12-04T09:17:18.2023677Z * [new branch] gh/aorenste/149/orig -> origin/gh/aorenste/149/orig 2025-12-04T09:17:18.2026476Z * [new branch] gh/aorenste/150/base -> origin/gh/aorenste/150/base 2025-12-04T09:17:18.2027768Z * [new branch] gh/aorenste/150/head -> origin/gh/aorenste/150/head 2025-12-04T09:17:18.2029792Z * [new branch] gh/aorenste/150/orig -> origin/gh/aorenste/150/orig 2025-12-04T09:17:18.2032315Z * [new branch] gh/aorenste/151/base -> origin/gh/aorenste/151/base 2025-12-04T09:17:18.2033865Z * [new branch] gh/aorenste/151/head -> origin/gh/aorenste/151/head 2025-12-04T09:17:18.2036094Z * [new branch] gh/aorenste/151/orig -> origin/gh/aorenste/151/orig 2025-12-04T09:17:18.2038668Z * [new branch] gh/aorenste/152/base -> origin/gh/aorenste/152/base 2025-12-04T09:17:18.2040206Z * [new branch] gh/aorenste/152/head -> origin/gh/aorenste/152/head 2025-12-04T09:17:18.2042257Z * [new branch] gh/aorenste/152/orig -> origin/gh/aorenste/152/orig 2025-12-04T09:17:18.2044733Z * [new branch] gh/aorenste/153/base -> origin/gh/aorenste/153/base 2025-12-04T09:17:18.2046228Z * [new branch] gh/aorenste/153/head -> origin/gh/aorenste/153/head 2025-12-04T09:17:18.2048340Z * [new branch] gh/aorenste/153/orig -> origin/gh/aorenste/153/orig 2025-12-04T09:17:18.2050956Z * [new branch] gh/aorenste/154/base -> origin/gh/aorenste/154/base 2025-12-04T09:17:18.2052395Z * [new branch] gh/aorenste/154/head -> origin/gh/aorenste/154/head 2025-12-04T09:17:18.2054558Z * [new branch] gh/aorenste/154/orig -> origin/gh/aorenste/154/orig 2025-12-04T09:17:18.2056868Z * [new branch] gh/aorenste/155/base -> origin/gh/aorenste/155/base 2025-12-04T09:17:18.2058417Z * [new branch] gh/aorenste/155/head -> origin/gh/aorenste/155/head 2025-12-04T09:17:18.2060748Z * [new branch] gh/aorenste/155/orig -> origin/gh/aorenste/155/orig 2025-12-04T09:17:18.2062996Z * [new branch] gh/aorenste/156/base -> origin/gh/aorenste/156/base 2025-12-04T09:17:18.2064770Z * [new branch] gh/aorenste/156/head -> origin/gh/aorenste/156/head 2025-12-04T09:17:18.2066806Z * [new branch] gh/aorenste/156/orig -> origin/gh/aorenste/156/orig 2025-12-04T09:17:18.2069648Z * [new branch] gh/aorenste/157/base -> origin/gh/aorenste/157/base 2025-12-04T09:17:18.2071372Z * [new branch] gh/aorenste/157/head -> origin/gh/aorenste/157/head 2025-12-04T09:17:18.2073293Z * [new branch] gh/aorenste/157/orig -> origin/gh/aorenste/157/orig 2025-12-04T09:17:18.2075719Z * [new branch] gh/aorenste/158/base -> origin/gh/aorenste/158/base 2025-12-04T09:17:18.2077443Z * [new branch] gh/aorenste/158/head -> origin/gh/aorenste/158/head 2025-12-04T09:17:18.2079435Z * [new branch] gh/aorenste/158/orig -> origin/gh/aorenste/158/orig 2025-12-04T09:17:18.2081855Z * [new branch] gh/aorenste/159/base -> origin/gh/aorenste/159/base 2025-12-04T09:17:18.2083582Z * [new branch] gh/aorenste/159/head -> origin/gh/aorenste/159/head 2025-12-04T09:17:18.2085517Z * [new branch] gh/aorenste/159/orig -> origin/gh/aorenste/159/orig 2025-12-04T09:17:18.2088724Z * [new branch] gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base 2025-12-04T09:17:18.2090551Z * [new branch] gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head 2025-12-04T09:17:18.2093015Z * [new branch] gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base 2025-12-04T09:17:18.2094464Z * [new branch] gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head 2025-12-04T09:17:18.2096506Z * [new branch] gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig 2025-12-04T09:17:18.2100255Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-12-04T09:17:18.2101576Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-12-04T09:17:18.2103815Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-12-04T09:17:18.2106234Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-12-04T09:17:18.2108476Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-12-04T09:17:18.2109992Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-12-04T09:17:18.2112822Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-12-04T09:17:18.2114456Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-12-04T09:17:18.2116267Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-12-04T09:17:18.2118976Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-12-04T09:17:18.2121231Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-12-04T09:17:18.2123384Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-12-04T09:17:18.2125817Z * [new branch] gh/bdhirsh/672/base -> origin/gh/bdhirsh/672/base 2025-12-04T09:17:18.2127721Z * [new branch] gh/bdhirsh/672/head -> origin/gh/bdhirsh/672/head 2025-12-04T09:17:18.2129701Z * [new branch] gh/bdhirsh/672/orig -> origin/gh/bdhirsh/672/orig 2025-12-04T09:17:18.2132205Z * [new branch] gh/bdhirsh/675/base -> origin/gh/bdhirsh/675/base 2025-12-04T09:17:18.2134248Z * [new branch] gh/bdhirsh/675/head -> origin/gh/bdhirsh/675/head 2025-12-04T09:17:18.2136051Z * [new branch] gh/bdhirsh/675/orig -> origin/gh/bdhirsh/675/orig 2025-12-04T09:17:18.2138564Z * [new branch] gh/bdhirsh/676/base -> origin/gh/bdhirsh/676/base 2025-12-04T09:17:18.2140725Z * [new branch] gh/bdhirsh/676/head -> origin/gh/bdhirsh/676/head 2025-12-04T09:17:18.2142518Z * [new branch] gh/bdhirsh/676/orig -> origin/gh/bdhirsh/676/orig 2025-12-04T09:17:18.2145017Z * [new branch] gh/bdhirsh/677/base -> origin/gh/bdhirsh/677/base 2025-12-04T09:17:18.2147206Z * [new branch] gh/bdhirsh/677/head -> origin/gh/bdhirsh/677/head 2025-12-04T09:17:18.2149074Z * [new branch] gh/bdhirsh/677/orig -> origin/gh/bdhirsh/677/orig 2025-12-04T09:17:18.2151780Z * [new branch] gh/bdhirsh/678/base -> origin/gh/bdhirsh/678/base 2025-12-04T09:17:18.2153736Z * [new branch] gh/bdhirsh/678/head -> origin/gh/bdhirsh/678/head 2025-12-04T09:17:18.2155591Z * [new branch] gh/bdhirsh/678/orig -> origin/gh/bdhirsh/678/orig 2025-12-04T09:17:18.2158271Z * [new branch] gh/bdhirsh/679/base -> origin/gh/bdhirsh/679/base 2025-12-04T09:17:18.2160307Z * [new branch] gh/bdhirsh/679/head -> origin/gh/bdhirsh/679/head 2025-12-04T09:17:18.2162163Z * [new branch] gh/bdhirsh/679/orig -> origin/gh/bdhirsh/679/orig 2025-12-04T09:17:18.2164769Z * [new branch] gh/bdhirsh/680/base -> origin/gh/bdhirsh/680/base 2025-12-04T09:17:18.2166608Z * [new branch] gh/bdhirsh/680/head -> origin/gh/bdhirsh/680/head 2025-12-04T09:17:18.2168469Z * [new branch] gh/bdhirsh/680/orig -> origin/gh/bdhirsh/680/orig 2025-12-04T09:17:18.2170806Z * [new branch] gh/bdhirsh/681/base -> origin/gh/bdhirsh/681/base 2025-12-04T09:17:18.2172798Z * [new branch] gh/bdhirsh/681/head -> origin/gh/bdhirsh/681/head 2025-12-04T09:17:18.2174774Z * [new branch] gh/bdhirsh/681/orig -> origin/gh/bdhirsh/681/orig 2025-12-04T09:17:18.2177712Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-12-04T09:17:18.2179713Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-12-04T09:17:18.2181600Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-12-04T09:17:18.2184441Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-12-04T09:17:18.2186149Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-12-04T09:17:18.2187976Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-12-04T09:17:18.2190556Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-12-04T09:17:18.2192405Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-12-04T09:17:18.2194065Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-12-04T09:17:18.2196609Z * [new branch] gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base 2025-12-04T09:17:18.2198448Z * [new branch] gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head 2025-12-04T09:17:18.2200313Z * [new branch] gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig 2025-12-04T09:17:18.2202808Z * [new branch] gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base 2025-12-04T09:17:18.2204652Z * [new branch] gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head 2025-12-04T09:17:18.2206459Z * [new branch] gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig 2025-12-04T09:17:18.2209423Z * [new branch] gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base 2025-12-04T09:17:18.2211128Z * [new branch] gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head 2025-12-04T09:17:18.2212956Z * [new branch] gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig 2025-12-04T09:17:18.2215497Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-12-04T09:17:18.2217294Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-12-04T09:17:18.2219258Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-12-04T09:17:18.2222925Z * [new branch] gh/bobrenjc93/570/base -> origin/gh/bobrenjc93/570/base 2025-12-04T09:17:18.2225122Z * [new branch] gh/bobrenjc93/570/head -> origin/gh/bobrenjc93/570/head 2025-12-04T09:17:18.2226627Z * [new branch] gh/bobrenjc93/570/orig -> origin/gh/bobrenjc93/570/orig 2025-12-04T09:17:18.2229268Z * [new branch] gh/bobrenjc93/604/base -> origin/gh/bobrenjc93/604/base 2025-12-04T09:17:18.2231195Z * [new branch] gh/bobrenjc93/604/head -> origin/gh/bobrenjc93/604/head 2025-12-04T09:17:18.2233016Z * [new branch] gh/bobrenjc93/604/orig -> origin/gh/bobrenjc93/604/orig 2025-12-04T09:17:18.2235675Z * [new branch] gh/bobrenjc93/638/base -> origin/gh/bobrenjc93/638/base 2025-12-04T09:17:18.2237480Z * [new branch] gh/bobrenjc93/638/head -> origin/gh/bobrenjc93/638/head 2025-12-04T09:17:18.2239301Z * [new branch] gh/bobrenjc93/638/orig -> origin/gh/bobrenjc93/638/orig 2025-12-04T09:17:18.2241808Z * [new branch] gh/bobrenjc93/653/base -> origin/gh/bobrenjc93/653/base 2025-12-04T09:17:18.2243667Z * [new branch] gh/bobrenjc93/653/head -> origin/gh/bobrenjc93/653/head 2025-12-04T09:17:18.2245503Z * [new branch] gh/bobrenjc93/653/orig -> origin/gh/bobrenjc93/653/orig 2025-12-04T09:17:18.2248343Z * [new branch] gh/bobrenjc93/654/base -> origin/gh/bobrenjc93/654/base 2025-12-04T09:17:18.2250171Z * [new branch] gh/bobrenjc93/654/head -> origin/gh/bobrenjc93/654/head 2025-12-04T09:17:18.2251895Z * [new branch] gh/bobrenjc93/654/orig -> origin/gh/bobrenjc93/654/orig 2025-12-04T09:17:18.2254448Z * [new branch] gh/bobrenjc93/657/base -> origin/gh/bobrenjc93/657/base 2025-12-04T09:17:18.2256238Z * [new branch] gh/bobrenjc93/657/head -> origin/gh/bobrenjc93/657/head 2025-12-04T09:17:18.2258029Z * [new branch] gh/bobrenjc93/657/orig -> origin/gh/bobrenjc93/657/orig 2025-12-04T09:17:18.2261061Z * [new branch] gh/bobrenjc93/672/base -> origin/gh/bobrenjc93/672/base 2025-12-04T09:17:18.2262678Z * [new branch] gh/bobrenjc93/672/head -> origin/gh/bobrenjc93/672/head 2025-12-04T09:17:18.2264477Z * [new branch] gh/bobrenjc93/672/orig -> origin/gh/bobrenjc93/672/orig 2025-12-04T09:17:18.2267029Z * [new branch] gh/bobrenjc93/679/base -> origin/gh/bobrenjc93/679/base 2025-12-04T09:17:18.2269161Z * [new branch] gh/bobrenjc93/679/head -> origin/gh/bobrenjc93/679/head 2025-12-04T09:17:18.2270919Z * [new branch] gh/bobrenjc93/679/orig -> origin/gh/bobrenjc93/679/orig 2025-12-04T09:17:18.2273439Z * [new branch] gh/bobrenjc93/680/base -> origin/gh/bobrenjc93/680/base 2025-12-04T09:17:18.2275324Z * [new branch] gh/bobrenjc93/680/head -> origin/gh/bobrenjc93/680/head 2025-12-04T09:17:18.2277128Z * [new branch] gh/bobrenjc93/680/orig -> origin/gh/bobrenjc93/680/orig 2025-12-04T09:17:18.2279544Z * [new branch] gh/bobrenjc93/681/base -> origin/gh/bobrenjc93/681/base 2025-12-04T09:17:18.2281494Z * [new branch] gh/bobrenjc93/681/head -> origin/gh/bobrenjc93/681/head 2025-12-04T09:17:18.2283289Z * [new branch] gh/bobrenjc93/681/orig -> origin/gh/bobrenjc93/681/orig 2025-12-04T09:17:18.2285679Z * [new branch] gh/bobrenjc93/682/base -> origin/gh/bobrenjc93/682/base 2025-12-04T09:17:18.2287618Z * [new branch] gh/bobrenjc93/682/head -> origin/gh/bobrenjc93/682/head 2025-12-04T09:17:18.2289417Z * [new branch] gh/bobrenjc93/682/orig -> origin/gh/bobrenjc93/682/orig 2025-12-04T09:17:18.2292127Z * [new branch] gh/bobrenjc93/683/base -> origin/gh/bobrenjc93/683/base 2025-12-04T09:17:18.2293661Z * [new branch] gh/bobrenjc93/683/head -> origin/gh/bobrenjc93/683/head 2025-12-04T09:17:18.2295454Z * [new branch] gh/bobrenjc93/683/orig -> origin/gh/bobrenjc93/683/orig 2025-12-04T09:17:18.2297972Z * [new branch] gh/bobrenjc93/684/base -> origin/gh/bobrenjc93/684/base 2025-12-04T09:17:18.2300178Z * [new branch] gh/bobrenjc93/684/head -> origin/gh/bobrenjc93/684/head 2025-12-04T09:17:18.2302193Z * [new branch] gh/bobrenjc93/684/orig -> origin/gh/bobrenjc93/684/orig 2025-12-04T09:17:18.2304791Z * [new branch] gh/bobrenjc93/685/base -> origin/gh/bobrenjc93/685/base 2025-12-04T09:17:18.2306554Z * [new branch] gh/bobrenjc93/685/head -> origin/gh/bobrenjc93/685/head 2025-12-04T09:17:18.2308873Z * [new branch] gh/bobrenjc93/685/orig -> origin/gh/bobrenjc93/685/orig 2025-12-04T09:17:18.2314281Z * [new branch] gh/bobrenjc93/686/base -> origin/gh/bobrenjc93/686/base 2025-12-04T09:17:18.2316817Z * [new branch] gh/bobrenjc93/686/head -> origin/gh/bobrenjc93/686/head 2025-12-04T09:17:18.2318908Z * [new branch] gh/bobrenjc93/686/orig -> origin/gh/bobrenjc93/686/orig 2025-12-04T09:17:18.2320962Z * [new branch] gh/bobrenjc93/687/base -> origin/gh/bobrenjc93/687/base 2025-12-04T09:17:18.2323322Z * [new branch] gh/bobrenjc93/687/head -> origin/gh/bobrenjc93/687/head 2025-12-04T09:17:18.2325027Z * [new branch] gh/bobrenjc93/687/orig -> origin/gh/bobrenjc93/687/orig 2025-12-04T09:17:18.2328038Z * [new branch] gh/bobrenjc93/688/base -> origin/gh/bobrenjc93/688/base 2025-12-04T09:17:18.2329911Z * [new branch] gh/bobrenjc93/688/head -> origin/gh/bobrenjc93/688/head 2025-12-04T09:17:18.2331848Z * [new branch] gh/bobrenjc93/688/orig -> origin/gh/bobrenjc93/688/orig 2025-12-04T09:17:18.2334166Z * [new branch] gh/bobrenjc93/689/base -> origin/gh/bobrenjc93/689/base 2025-12-04T09:17:18.2336088Z * [new branch] gh/bobrenjc93/689/head -> origin/gh/bobrenjc93/689/head 2025-12-04T09:17:18.2337915Z * [new branch] gh/bobrenjc93/689/orig -> origin/gh/bobrenjc93/689/orig 2025-12-04T09:17:18.2341071Z * [new branch] gh/bobrenjc93/690/base -> origin/gh/bobrenjc93/690/base 2025-12-04T09:17:18.2342464Z * [new branch] gh/bobrenjc93/690/head -> origin/gh/bobrenjc93/690/head 2025-12-04T09:17:18.2344328Z * [new branch] gh/bobrenjc93/690/orig -> origin/gh/bobrenjc93/690/orig 2025-12-04T09:17:18.2347511Z * [new branch] gh/bobrenjc93/691/base -> origin/gh/bobrenjc93/691/base 2025-12-04T09:17:18.2349599Z * [new branch] gh/bobrenjc93/691/head -> origin/gh/bobrenjc93/691/head 2025-12-04T09:17:18.2351802Z * [new branch] gh/bobrenjc93/691/orig -> origin/gh/bobrenjc93/691/orig 2025-12-04T09:17:18.2355076Z * [new branch] gh/bobrenjc93/692/base -> origin/gh/bobrenjc93/692/base 2025-12-04T09:17:18.2356891Z * [new branch] gh/bobrenjc93/692/head -> origin/gh/bobrenjc93/692/head 2025-12-04T09:17:18.2358700Z * [new branch] gh/bobrenjc93/692/orig -> origin/gh/bobrenjc93/692/orig 2025-12-04T09:17:18.2361154Z * [new branch] gh/bobrenjc93/693/base -> origin/gh/bobrenjc93/693/base 2025-12-04T09:17:18.2363025Z * [new branch] gh/bobrenjc93/693/head -> origin/gh/bobrenjc93/693/head 2025-12-04T09:17:18.2364920Z * [new branch] gh/bobrenjc93/693/orig -> origin/gh/bobrenjc93/693/orig 2025-12-04T09:17:18.2367625Z * [new branch] gh/bobrenjc93/694/base -> origin/gh/bobrenjc93/694/base 2025-12-04T09:17:18.2369536Z * [new branch] gh/bobrenjc93/694/head -> origin/gh/bobrenjc93/694/head 2025-12-04T09:17:18.2371346Z * [new branch] gh/bobrenjc93/694/orig -> origin/gh/bobrenjc93/694/orig 2025-12-04T09:17:18.2373780Z * [new branch] gh/bobrenjc93/695/base -> origin/gh/bobrenjc93/695/base 2025-12-04T09:17:18.2375581Z * [new branch] gh/bobrenjc93/695/head -> origin/gh/bobrenjc93/695/head 2025-12-04T09:17:18.2377415Z * [new branch] gh/bobrenjc93/695/orig -> origin/gh/bobrenjc93/695/orig 2025-12-04T09:17:18.2380838Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-12-04T09:17:18.2382761Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-12-04T09:17:18.2385254Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-12-04T09:17:18.2387028Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-12-04T09:17:18.2388820Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-12-04T09:17:18.2391340Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-12-04T09:17:18.2393220Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-12-04T09:17:18.2395167Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-12-04T09:17:18.2397585Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-12-04T09:17:18.2399588Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-12-04T09:17:18.2401412Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-12-04T09:17:18.2403750Z * [new branch] gh/c00w/57/base -> origin/gh/c00w/57/base 2025-12-04T09:17:18.2405593Z * [new branch] gh/c00w/57/head -> origin/gh/c00w/57/head 2025-12-04T09:17:18.2407464Z * [new branch] gh/c00w/57/orig -> origin/gh/c00w/57/orig 2025-12-04T09:17:18.2410256Z * [new branch] gh/c00w/58/base -> origin/gh/c00w/58/base 2025-12-04T09:17:18.2411890Z * [new branch] gh/c00w/58/head -> origin/gh/c00w/58/head 2025-12-04T09:17:18.2413690Z * [new branch] gh/c00w/58/orig -> origin/gh/c00w/58/orig 2025-12-04T09:17:18.2416936Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-12-04T09:17:18.2418825Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-12-04T09:17:18.2420865Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-12-04T09:17:18.2424040Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-12-04T09:17:18.2426022Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-12-04T09:17:18.2428855Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-12-04T09:17:18.2430622Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-12-04T09:17:18.2432497Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-12-04T09:17:18.2435136Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-12-04T09:17:18.2437121Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-12-04T09:17:18.2439107Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-12-04T09:17:18.2441719Z * [new branch] gh/coconutruben/70/base -> origin/gh/coconutruben/70/base 2025-12-04T09:17:18.2443566Z * [new branch] gh/coconutruben/70/head -> origin/gh/coconutruben/70/head 2025-12-04T09:17:18.2445507Z * [new branch] gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig 2025-12-04T09:17:18.2447858Z * [new branch] gh/coconutruben/71/base -> origin/gh/coconutruben/71/base 2025-12-04T09:17:18.2449744Z * [new branch] gh/coconutruben/71/head -> origin/gh/coconutruben/71/head 2025-12-04T09:17:18.2451611Z * [new branch] gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig 2025-12-04T09:17:18.2454558Z * [new branch] gh/coconutruben/72/base -> origin/gh/coconutruben/72/base 2025-12-04T09:17:18.2456212Z * [new branch] gh/coconutruben/72/head -> origin/gh/coconutruben/72/head 2025-12-04T09:17:18.2458152Z * [new branch] gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig 2025-12-04T09:17:18.2460830Z * [new branch] gh/coconutruben/73/base -> origin/gh/coconutruben/73/base 2025-12-04T09:17:18.2462564Z * [new branch] gh/coconutruben/73/head -> origin/gh/coconutruben/73/head 2025-12-04T09:17:18.2464483Z * [new branch] gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig 2025-12-04T09:17:18.2467132Z * [new branch] gh/coconutruben/74/base -> origin/gh/coconutruben/74/base 2025-12-04T09:17:18.2469117Z * [new branch] gh/coconutruben/74/head -> origin/gh/coconutruben/74/head 2025-12-04T09:17:18.2470952Z * [new branch] gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig 2025-12-04T09:17:18.2473590Z * [new branch] gh/coconutruben/79/base -> origin/gh/coconutruben/79/base 2025-12-04T09:17:18.2475665Z * [new branch] gh/coconutruben/79/head -> origin/gh/coconutruben/79/head 2025-12-04T09:17:18.2477371Z * [new branch] gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig 2025-12-04T09:17:18.2480121Z * [new branch] gh/coconutruben/80/base -> origin/gh/coconutruben/80/base 2025-12-04T09:17:18.2482746Z * [new branch] gh/coconutruben/80/head -> origin/gh/coconutruben/80/head 2025-12-04T09:17:18.2484084Z * [new branch] gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig 2025-12-04T09:17:18.2486715Z * [new branch] gh/coconutruben/82/base -> origin/gh/coconutruben/82/base 2025-12-04T09:17:18.2488493Z * [new branch] gh/coconutruben/82/head -> origin/gh/coconutruben/82/head 2025-12-04T09:17:18.2490411Z * [new branch] gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig 2025-12-04T09:17:18.2493070Z * [new branch] gh/coconutruben/83/base -> origin/gh/coconutruben/83/base 2025-12-04T09:17:18.2494827Z * [new branch] gh/coconutruben/83/head -> origin/gh/coconutruben/83/head 2025-12-04T09:17:18.2496647Z * [new branch] gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig 2025-12-04T09:17:18.2500208Z * [new branch] gh/coconutruben/84/base -> origin/gh/coconutruben/84/base 2025-12-04T09:17:18.2501910Z * [new branch] gh/coconutruben/84/head -> origin/gh/coconutruben/84/head 2025-12-04T09:17:18.2503701Z * [new branch] gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig 2025-12-04T09:17:18.2506237Z * [new branch] gh/coconutruben/85/base -> origin/gh/coconutruben/85/base 2025-12-04T09:17:18.2508229Z * [new branch] gh/coconutruben/85/head -> origin/gh/coconutruben/85/head 2025-12-04T09:17:18.2510310Z * [new branch] gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig 2025-12-04T09:17:18.2513012Z * [new branch] gh/coconutruben/86/base -> origin/gh/coconutruben/86/base 2025-12-04T09:17:18.2515037Z * [new branch] gh/coconutruben/86/head -> origin/gh/coconutruben/86/head 2025-12-04T09:17:18.2516653Z * [new branch] gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig 2025-12-04T09:17:18.2519723Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-12-04T09:17:18.2521578Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-12-04T09:17:18.2523966Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-12-04T09:17:18.2525796Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-12-04T09:17:18.2528273Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-12-04T09:17:18.2530029Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-12-04T09:17:18.2532349Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-12-04T09:17:18.2534149Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-12-04T09:17:18.2537300Z * [new branch] gh/d4l3k/1/base -> origin/gh/d4l3k/1/base 2025-12-04T09:17:18.2539184Z * [new branch] gh/d4l3k/1/head -> origin/gh/d4l3k/1/head 2025-12-04T09:17:18.2541777Z * [new branch] gh/d4l3k/2/base -> origin/gh/d4l3k/2/base 2025-12-04T09:17:18.2543608Z * [new branch] gh/d4l3k/2/head -> origin/gh/d4l3k/2/head 2025-12-04T09:17:18.2545394Z * [new branch] gh/d4l3k/2/orig -> origin/gh/d4l3k/2/orig 2025-12-04T09:17:18.2547887Z * [new branch] gh/d4l3k/3/base -> origin/gh/d4l3k/3/base 2025-12-04T09:17:18.2549715Z * [new branch] gh/d4l3k/3/head -> origin/gh/d4l3k/3/head 2025-12-04T09:17:18.2551690Z * [new branch] gh/d4l3k/3/orig -> origin/gh/d4l3k/3/orig 2025-12-04T09:17:18.2554095Z * [new branch] gh/d4l3k/4/base -> origin/gh/d4l3k/4/base 2025-12-04T09:17:18.2555914Z * [new branch] gh/d4l3k/4/head -> origin/gh/d4l3k/4/head 2025-12-04T09:17:18.2557830Z * [new branch] gh/d4l3k/4/orig -> origin/gh/d4l3k/4/orig 2025-12-04T09:17:18.2560347Z * [new branch] gh/d4l3k/5/base -> origin/gh/d4l3k/5/base 2025-12-04T09:17:18.2562283Z * [new branch] gh/d4l3k/5/orig -> origin/gh/d4l3k/5/orig 2025-12-04T09:17:18.2565397Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-12-04T09:17:18.2567290Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-12-04T09:17:18.2569133Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-12-04T09:17:18.2571742Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-12-04T09:17:18.2573647Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-12-04T09:17:18.2575507Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-12-04T09:17:18.2579044Z * [new branch] gh/desertfire/605/base -> origin/gh/desertfire/605/base 2025-12-04T09:17:18.2581004Z * [new branch] gh/desertfire/605/head -> origin/gh/desertfire/605/head 2025-12-04T09:17:18.2582854Z * [new branch] gh/desertfire/605/orig -> origin/gh/desertfire/605/orig 2025-12-04T09:17:18.2585430Z * [new branch] gh/desertfire/606/base -> origin/gh/desertfire/606/base 2025-12-04T09:17:18.2587249Z * [new branch] gh/desertfire/606/head -> origin/gh/desertfire/606/head 2025-12-04T09:17:18.2589206Z * [new branch] gh/desertfire/606/orig -> origin/gh/desertfire/606/orig 2025-12-04T09:17:18.2591754Z * [new branch] gh/desertfire/607/base -> origin/gh/desertfire/607/base 2025-12-04T09:17:18.2593604Z * [new branch] gh/desertfire/607/head -> origin/gh/desertfire/607/head 2025-12-04T09:17:18.2595500Z * [new branch] gh/desertfire/607/orig -> origin/gh/desertfire/607/orig 2025-12-04T09:17:18.2598098Z * [new branch] gh/desertfire/608/base -> origin/gh/desertfire/608/base 2025-12-04T09:17:18.2599922Z * [new branch] gh/desertfire/608/head -> origin/gh/desertfire/608/head 2025-12-04T09:17:18.2601786Z * [new branch] gh/desertfire/608/orig -> origin/gh/desertfire/608/orig 2025-12-04T09:17:18.2604276Z * [new branch] gh/desertfire/609/base -> origin/gh/desertfire/609/base 2025-12-04T09:17:18.2606094Z * [new branch] gh/desertfire/609/head -> origin/gh/desertfire/609/head 2025-12-04T09:17:18.2608158Z * [new branch] gh/desertfire/609/orig -> origin/gh/desertfire/609/orig 2025-12-04T09:17:18.2610985Z * [new branch] gh/desertfire/610/base -> origin/gh/desertfire/610/base 2025-12-04T09:17:18.2612806Z * [new branch] gh/desertfire/610/head -> origin/gh/desertfire/610/head 2025-12-04T09:17:18.2615025Z * [new branch] gh/desertfire/610/orig -> origin/gh/desertfire/610/orig 2025-12-04T09:17:18.2617043Z * [new branch] gh/desertfire/611/base -> origin/gh/desertfire/611/base 2025-12-04T09:17:18.2618908Z * [new branch] gh/desertfire/611/head -> origin/gh/desertfire/611/head 2025-12-04T09:17:18.2620943Z * [new branch] gh/desertfire/611/orig -> origin/gh/desertfire/611/orig 2025-12-04T09:17:18.2623547Z * [new branch] gh/desertfire/612/base -> origin/gh/desertfire/612/base 2025-12-04T09:17:18.2625593Z * [new branch] gh/desertfire/612/head -> origin/gh/desertfire/612/head 2025-12-04T09:17:18.2627350Z * [new branch] gh/desertfire/612/orig -> origin/gh/desertfire/612/orig 2025-12-04T09:17:18.2629842Z * [new branch] gh/desertfire/613/base -> origin/gh/desertfire/613/base 2025-12-04T09:17:18.2631737Z * [new branch] gh/desertfire/613/head -> origin/gh/desertfire/613/head 2025-12-04T09:17:18.2633619Z * [new branch] gh/desertfire/613/orig -> origin/gh/desertfire/613/orig 2025-12-04T09:17:18.2642578Z * [new branch] gh/desertfire/614/base -> origin/gh/desertfire/614/base 2025-12-04T09:17:18.2642953Z * [new branch] gh/desertfire/614/head -> origin/gh/desertfire/614/head 2025-12-04T09:17:18.2643202Z * [new branch] gh/desertfire/614/orig -> origin/gh/desertfire/614/orig 2025-12-04T09:17:18.2643425Z * [new branch] gh/desertfire/615/base -> origin/gh/desertfire/615/base 2025-12-04T09:17:18.2644865Z * [new branch] gh/desertfire/615/head -> origin/gh/desertfire/615/head 2025-12-04T09:17:18.2646606Z * [new branch] gh/desertfire/615/orig -> origin/gh/desertfire/615/orig 2025-12-04T09:17:18.2648950Z * [new branch] gh/desertfire/616/base -> origin/gh/desertfire/616/base 2025-12-04T09:17:18.2650898Z * [new branch] gh/desertfire/616/head -> origin/gh/desertfire/616/head 2025-12-04T09:17:18.2652649Z * [new branch] gh/desertfire/616/orig -> origin/gh/desertfire/616/orig 2025-12-04T09:17:18.2654997Z * [new branch] gh/desertfire/617/base -> origin/gh/desertfire/617/base 2025-12-04T09:17:18.2657013Z * [new branch] gh/desertfire/617/head -> origin/gh/desertfire/617/head 2025-12-04T09:17:18.2658840Z * [new branch] gh/desertfire/617/orig -> origin/gh/desertfire/617/orig 2025-12-04T09:17:18.2662133Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-12-04T09:17:18.2663994Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-12-04T09:17:18.2667072Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-12-04T09:17:18.2669212Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-12-04T09:17:18.2670735Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-12-04T09:17:18.2673245Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-12-04T09:17:18.2675082Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-12-04T09:17:18.2677432Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-12-04T09:17:18.2679155Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-12-04T09:17:18.2681492Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-12-04T09:17:18.2683455Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-12-04T09:17:18.2685947Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-12-04T09:17:18.2687810Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-12-04T09:17:18.2690323Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-12-04T09:17:18.2692169Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-12-04T09:17:18.2693987Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-12-04T09:17:18.2696483Z * [new branch] gh/drisspg/200/base -> origin/gh/drisspg/200/base 2025-12-04T09:17:18.2698522Z * [new branch] gh/drisspg/200/head -> origin/gh/drisspg/200/head 2025-12-04T09:17:18.2700344Z * [new branch] gh/drisspg/200/orig -> origin/gh/drisspg/200/orig 2025-12-04T09:17:18.2702730Z * [new branch] gh/drisspg/218/base -> origin/gh/drisspg/218/base 2025-12-04T09:17:18.2704564Z * [new branch] gh/drisspg/218/head -> origin/gh/drisspg/218/head 2025-12-04T09:17:18.2706424Z * [new branch] gh/drisspg/218/orig -> origin/gh/drisspg/218/orig 2025-12-04T09:17:18.2711569Z * [new branch] gh/drisspg/219/base -> origin/gh/drisspg/219/base 2025-12-04T09:17:18.2713307Z * [new branch] gh/drisspg/219/head -> origin/gh/drisspg/219/head 2025-12-04T09:17:18.2715216Z * [new branch] gh/drisspg/219/orig -> origin/gh/drisspg/219/orig 2025-12-04T09:17:18.2717615Z * [new branch] gh/drisspg/220/base -> origin/gh/drisspg/220/base 2025-12-04T09:17:18.2719459Z * [new branch] gh/drisspg/220/head -> origin/gh/drisspg/220/head 2025-12-04T09:17:18.2721220Z * [new branch] gh/drisspg/220/orig -> origin/gh/drisspg/220/orig 2025-12-04T09:17:18.2723841Z * [new branch] gh/drisspg/221/base -> origin/gh/drisspg/221/base 2025-12-04T09:17:18.2725714Z * [new branch] gh/drisspg/221/head -> origin/gh/drisspg/221/head 2025-12-04T09:17:18.2727581Z * [new branch] gh/drisspg/221/orig -> origin/gh/drisspg/221/orig 2025-12-04T09:17:18.2730376Z * [new branch] gh/drisspg/222/base -> origin/gh/drisspg/222/base 2025-12-04T09:17:18.2732184Z * [new branch] gh/drisspg/222/head -> origin/gh/drisspg/222/head 2025-12-04T09:17:18.2733977Z * [new branch] gh/drisspg/222/orig -> origin/gh/drisspg/222/orig 2025-12-04T09:17:18.2736580Z * [new branch] gh/drisspg/223/base -> origin/gh/drisspg/223/base 2025-12-04T09:17:18.2738370Z * [new branch] gh/drisspg/223/head -> origin/gh/drisspg/223/head 2025-12-04T09:17:18.2740396Z * [new branch] gh/drisspg/223/orig -> origin/gh/drisspg/223/orig 2025-12-04T09:17:18.2742906Z * [new branch] gh/drisspg/224/base -> origin/gh/drisspg/224/base 2025-12-04T09:17:18.2744716Z * [new branch] gh/drisspg/224/head -> origin/gh/drisspg/224/head 2025-12-04T09:17:18.2746563Z * [new branch] gh/drisspg/224/orig -> origin/gh/drisspg/224/orig 2025-12-04T09:17:18.2749020Z * [new branch] gh/drisspg/225/base -> origin/gh/drisspg/225/base 2025-12-04T09:17:18.2750881Z * [new branch] gh/drisspg/225/head -> origin/gh/drisspg/225/head 2025-12-04T09:17:18.2752697Z * [new branch] gh/drisspg/225/orig -> origin/gh/drisspg/225/orig 2025-12-04T09:17:18.2755203Z * [new branch] gh/drisspg/226/base -> origin/gh/drisspg/226/base 2025-12-04T09:17:18.2756952Z * [new branch] gh/drisspg/226/head -> origin/gh/drisspg/226/head 2025-12-04T09:17:18.2758760Z * [new branch] gh/drisspg/226/orig -> origin/gh/drisspg/226/orig 2025-12-04T09:17:18.2762026Z * [new branch] gh/drisspg/227/base -> origin/gh/drisspg/227/base 2025-12-04T09:17:18.2763809Z * [new branch] gh/drisspg/227/head -> origin/gh/drisspg/227/head 2025-12-04T09:17:18.2765648Z * [new branch] gh/drisspg/227/orig -> origin/gh/drisspg/227/orig 2025-12-04T09:17:18.2768259Z * [new branch] gh/drisspg/228/base -> origin/gh/drisspg/228/base 2025-12-04T09:17:18.2770060Z * [new branch] gh/drisspg/228/head -> origin/gh/drisspg/228/head 2025-12-04T09:17:18.2771907Z * [new branch] gh/drisspg/228/orig -> origin/gh/drisspg/228/orig 2025-12-04T09:17:18.2774345Z * [new branch] gh/drisspg/229/base -> origin/gh/drisspg/229/base 2025-12-04T09:17:18.2776177Z * [new branch] gh/drisspg/229/head -> origin/gh/drisspg/229/head 2025-12-04T09:17:18.2778152Z * [new branch] gh/drisspg/229/orig -> origin/gh/drisspg/229/orig 2025-12-04T09:17:18.2780861Z * [new branch] gh/drisspg/230/base -> origin/gh/drisspg/230/base 2025-12-04T09:17:18.2782709Z * [new branch] gh/drisspg/230/head -> origin/gh/drisspg/230/head 2025-12-04T09:17:18.2785341Z * [new branch] gh/drisspg/230/orig -> origin/gh/drisspg/230/orig 2025-12-04T09:17:18.2788561Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-12-04T09:17:18.2790395Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-12-04T09:17:18.2793577Z * [new branch] gh/dzmitry-huba/1/base -> origin/gh/dzmitry-huba/1/base 2025-12-04T09:17:18.2795381Z * [new branch] gh/dzmitry-huba/1/head -> origin/gh/dzmitry-huba/1/head 2025-12-04T09:17:18.2798081Z * [new branch] gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base 2025-12-04T09:17:18.2800042Z * [new branch] gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head 2025-12-04T09:17:18.2801940Z * [new branch] gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig 2025-12-04T09:17:18.2804545Z * [new branch] gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base 2025-12-04T09:17:18.2806463Z * [new branch] gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head 2025-12-04T09:17:18.2808436Z * [new branch] gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig 2025-12-04T09:17:18.2811078Z * [new branch] gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base 2025-12-04T09:17:18.2812866Z * [new branch] gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head 2025-12-04T09:17:18.2814650Z * [new branch] gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig 2025-12-04T09:17:18.2817317Z * [new branch] gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base 2025-12-04T09:17:18.2819278Z * [new branch] gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head 2025-12-04T09:17:18.2821029Z * [new branch] gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig 2025-12-04T09:17:18.2823694Z * [new branch] gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base 2025-12-04T09:17:18.2825563Z * [new branch] gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head 2025-12-04T09:17:18.2827495Z * [new branch] gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig 2025-12-04T09:17:18.2830110Z * [new branch] gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base 2025-12-04T09:17:18.2831964Z * [new branch] gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head 2025-12-04T09:17:18.2833797Z * [new branch] gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig 2025-12-04T09:17:18.2836216Z * [new branch] gh/dzmitry-huba/2/base -> origin/gh/dzmitry-huba/2/base 2025-12-04T09:17:18.2838002Z * [new branch] gh/dzmitry-huba/2/head -> origin/gh/dzmitry-huba/2/head 2025-12-04T09:17:18.2840410Z * [new branch] gh/dzmitry-huba/3/base -> origin/gh/dzmitry-huba/3/base 2025-12-04T09:17:18.2842172Z * [new branch] gh/dzmitry-huba/3/head -> origin/gh/dzmitry-huba/3/head 2025-12-04T09:17:18.2845372Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-12-04T09:17:18.2847241Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-12-04T09:17:18.2849111Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-12-04T09:17:18.2851872Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-12-04T09:17:18.2853903Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-12-04T09:17:18.2855534Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-12-04T09:17:18.2858126Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-12-04T09:17:18.2860183Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-12-04T09:17:18.2861961Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-12-04T09:17:18.2864522Z * [new branch] gh/eellison/862/base -> origin/gh/eellison/862/base 2025-12-04T09:17:18.2866361Z * [new branch] gh/eellison/862/head -> origin/gh/eellison/862/head 2025-12-04T09:17:18.2868197Z * [new branch] gh/eellison/862/orig -> origin/gh/eellison/862/orig 2025-12-04T09:17:18.2870715Z * [new branch] gh/eellison/863/base -> origin/gh/eellison/863/base 2025-12-04T09:17:18.2872732Z * [new branch] gh/eellison/863/head -> origin/gh/eellison/863/head 2025-12-04T09:17:18.2874742Z * [new branch] gh/eellison/863/orig -> origin/gh/eellison/863/orig 2025-12-04T09:17:18.2877199Z * [new branch] gh/eellison/864/base -> origin/gh/eellison/864/base 2025-12-04T09:17:18.2878927Z * [new branch] gh/eellison/864/head -> origin/gh/eellison/864/head 2025-12-04T09:17:18.2881009Z * [new branch] gh/eellison/864/orig -> origin/gh/eellison/864/orig 2025-12-04T09:17:18.2884073Z * [new branch] gh/eellison/865/base -> origin/gh/eellison/865/base 2025-12-04T09:17:18.2886846Z * [new branch] gh/eellison/865/head -> origin/gh/eellison/865/head 2025-12-04T09:17:18.2889375Z * [new branch] gh/eellison/865/orig -> origin/gh/eellison/865/orig 2025-12-04T09:17:18.2893184Z * [new branch] gh/eellison/866/base -> origin/gh/eellison/866/base 2025-12-04T09:17:18.2895536Z * [new branch] gh/eellison/866/head -> origin/gh/eellison/866/head 2025-12-04T09:17:18.2897940Z * [new branch] gh/eellison/866/orig -> origin/gh/eellison/866/orig 2025-12-04T09:17:18.2901802Z * [new branch] gh/eellison/867/base -> origin/gh/eellison/867/base 2025-12-04T09:17:18.2903975Z * [new branch] gh/eellison/867/head -> origin/gh/eellison/867/head 2025-12-04T09:17:18.2906443Z * [new branch] gh/eellison/867/orig -> origin/gh/eellison/867/orig 2025-12-04T09:17:18.2910642Z * [new branch] gh/eellison/868/base -> origin/gh/eellison/868/base 2025-12-04T09:17:18.2913379Z * [new branch] gh/eellison/868/head -> origin/gh/eellison/868/head 2025-12-04T09:17:18.2915786Z * [new branch] gh/eellison/868/orig -> origin/gh/eellison/868/orig 2025-12-04T09:17:18.2919260Z * [new branch] gh/eellison/869/base -> origin/gh/eellison/869/base 2025-12-04T09:17:18.2921660Z * [new branch] gh/eellison/869/head -> origin/gh/eellison/869/head 2025-12-04T09:17:18.2924056Z * [new branch] gh/eellison/869/orig -> origin/gh/eellison/869/orig 2025-12-04T09:17:18.2927374Z * [new branch] gh/eellison/870/base -> origin/gh/eellison/870/base 2025-12-04T09:17:18.2929824Z * [new branch] gh/eellison/870/head -> origin/gh/eellison/870/head 2025-12-04T09:17:18.2932217Z * [new branch] gh/eellison/870/orig -> origin/gh/eellison/870/orig 2025-12-04T09:17:18.2935961Z * [new branch] gh/eellison/871/base -> origin/gh/eellison/871/base 2025-12-04T09:17:18.2937569Z * [new branch] gh/eellison/871/head -> origin/gh/eellison/871/head 2025-12-04T09:17:18.2939549Z * [new branch] gh/eellison/871/orig -> origin/gh/eellison/871/orig 2025-12-04T09:17:18.2942425Z * [new branch] gh/eellison/872/base -> origin/gh/eellison/872/base 2025-12-04T09:17:18.2944170Z * [new branch] gh/eellison/872/head -> origin/gh/eellison/872/head 2025-12-04T09:17:18.2946056Z * [new branch] gh/eellison/872/orig -> origin/gh/eellison/872/orig 2025-12-04T09:17:18.2948934Z * [new branch] gh/eellison/873/base -> origin/gh/eellison/873/base 2025-12-04T09:17:18.2950738Z * [new branch] gh/eellison/873/head -> origin/gh/eellison/873/head 2025-12-04T09:17:18.2952601Z * [new branch] gh/eellison/873/orig -> origin/gh/eellison/873/orig 2025-12-04T09:17:18.2955269Z * [new branch] gh/eellison/874/base -> origin/gh/eellison/874/base 2025-12-04T09:17:18.2957227Z * [new branch] gh/eellison/874/head -> origin/gh/eellison/874/head 2025-12-04T09:17:18.2959109Z * [new branch] gh/eellison/874/orig -> origin/gh/eellison/874/orig 2025-12-04T09:17:18.2962313Z * [new branch] gh/eellison/875/base -> origin/gh/eellison/875/base 2025-12-04T09:17:18.2964213Z * [new branch] gh/eellison/875/head -> origin/gh/eellison/875/head 2025-12-04T09:17:18.2966067Z * [new branch] gh/eellison/875/orig -> origin/gh/eellison/875/orig 2025-12-04T09:17:18.2968725Z * [new branch] gh/eellison/876/base -> origin/gh/eellison/876/base 2025-12-04T09:17:18.2970619Z * [new branch] gh/eellison/876/head -> origin/gh/eellison/876/head 2025-12-04T09:17:18.2972650Z * [new branch] gh/eellison/876/orig -> origin/gh/eellison/876/orig 2025-12-04T09:17:18.2975510Z * [new branch] gh/eellison/877/base -> origin/gh/eellison/877/base 2025-12-04T09:17:18.2977452Z * [new branch] gh/eellison/877/head -> origin/gh/eellison/877/head 2025-12-04T09:17:18.2979268Z * [new branch] gh/eellison/877/orig -> origin/gh/eellison/877/orig 2025-12-04T09:17:18.2982299Z * [new branch] gh/eellison/878/base -> origin/gh/eellison/878/base 2025-12-04T09:17:18.2984076Z * [new branch] gh/eellison/878/head -> origin/gh/eellison/878/head 2025-12-04T09:17:18.2985971Z * [new branch] gh/eellison/878/orig -> origin/gh/eellison/878/orig 2025-12-04T09:17:18.2988649Z * [new branch] gh/eellison/879/base -> origin/gh/eellison/879/base 2025-12-04T09:17:18.2990580Z * [new branch] gh/eellison/879/head -> origin/gh/eellison/879/head 2025-12-04T09:17:18.2993050Z * [new branch] gh/eellison/879/orig -> origin/gh/eellison/879/orig 2025-12-04T09:17:18.2995525Z * [new branch] gh/eellison/880/base -> origin/gh/eellison/880/base 2025-12-04T09:17:18.2997420Z * [new branch] gh/eellison/880/head -> origin/gh/eellison/880/head 2025-12-04T09:17:18.2999356Z * [new branch] gh/eellison/880/orig -> origin/gh/eellison/880/orig 2025-12-04T09:17:18.3002600Z * [new branch] gh/eellison/881/base -> origin/gh/eellison/881/base 2025-12-04T09:17:18.3004045Z * [new branch] gh/eellison/881/head -> origin/gh/eellison/881/head 2025-12-04T09:17:18.3005985Z * [new branch] gh/eellison/881/orig -> origin/gh/eellison/881/orig 2025-12-04T09:17:18.3008696Z * [new branch] gh/eellison/882/base -> origin/gh/eellison/882/base 2025-12-04T09:17:18.3010732Z * [new branch] gh/eellison/882/head -> origin/gh/eellison/882/head 2025-12-04T09:17:18.3012817Z * [new branch] gh/eellison/882/orig -> origin/gh/eellison/882/orig 2025-12-04T09:17:18.3015360Z * [new branch] gh/eellison/883/base -> origin/gh/eellison/883/base 2025-12-04T09:17:18.3017195Z * [new branch] gh/eellison/883/head -> origin/gh/eellison/883/head 2025-12-04T09:17:18.3019195Z * [new branch] gh/eellison/883/orig -> origin/gh/eellison/883/orig 2025-12-04T09:17:18.3021833Z * [new branch] gh/eellison/884/base -> origin/gh/eellison/884/base 2025-12-04T09:17:18.3023714Z * [new branch] gh/eellison/884/head -> origin/gh/eellison/884/head 2025-12-04T09:17:18.3025523Z * [new branch] gh/eellison/884/orig -> origin/gh/eellison/884/orig 2025-12-04T09:17:18.3028706Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-12-04T09:17:18.3030598Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-12-04T09:17:18.3033344Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-12-04T09:17:18.3035253Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-12-04T09:17:18.3037054Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-12-04T09:17:18.3039591Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-12-04T09:17:18.3041494Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-12-04T09:17:18.3043375Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-12-04T09:17:18.3046241Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-12-04T09:17:18.3048112Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-12-04T09:17:18.3050003Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-12-04T09:17:18.3052661Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-12-04T09:17:18.3054561Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-12-04T09:17:18.3056435Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-12-04T09:17:18.3059284Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-12-04T09:17:18.3061401Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-12-04T09:17:18.3063342Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-12-04T09:17:18.3066517Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-12-04T09:17:18.3068410Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-12-04T09:17:18.3070260Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-12-04T09:17:18.3072905Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-12-04T09:17:18.3074903Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-12-04T09:17:18.3076793Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-12-04T09:17:18.3079478Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-12-04T09:17:18.3081603Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-12-04T09:17:18.3083490Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-12-04T09:17:18.3085962Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-12-04T09:17:18.3087895Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-12-04T09:17:18.3089761Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-12-04T09:17:18.3092494Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-12-04T09:17:18.3094389Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-12-04T09:17:18.3096218Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-12-04T09:17:18.3098952Z * [new branch] gh/etaf/172/base -> origin/gh/etaf/172/base 2025-12-04T09:17:18.3100989Z * [new branch] gh/etaf/172/head -> origin/gh/etaf/172/head 2025-12-04T09:17:18.3102843Z * [new branch] gh/etaf/172/orig -> origin/gh/etaf/172/orig 2025-12-04T09:17:18.3105713Z * [new branch] gh/etaf/173/base -> origin/gh/etaf/173/base 2025-12-04T09:17:18.3107909Z * [new branch] gh/etaf/173/head -> origin/gh/etaf/173/head 2025-12-04T09:17:18.3112597Z * [new branch] gh/etaf/173/orig -> origin/gh/etaf/173/orig 2025-12-04T09:17:18.3115227Z * [new branch] gh/etaf/174/base -> origin/gh/etaf/174/base 2025-12-04T09:17:18.3117087Z * [new branch] gh/etaf/174/head -> origin/gh/etaf/174/head 2025-12-04T09:17:18.3119684Z * [new branch] gh/etaf/175/base -> origin/gh/etaf/175/base 2025-12-04T09:17:18.3121950Z * [new branch] gh/etaf/175/head -> origin/gh/etaf/175/head 2025-12-04T09:17:18.3123369Z * [new branch] gh/etaf/175/orig -> origin/gh/etaf/175/orig 2025-12-04T09:17:18.3125982Z * [new branch] gh/etaf/176/base -> origin/gh/etaf/176/base 2025-12-04T09:17:18.3127940Z * [new branch] gh/etaf/176/head -> origin/gh/etaf/176/head 2025-12-04T09:17:18.3129774Z * [new branch] gh/etaf/176/orig -> origin/gh/etaf/176/orig 2025-12-04T09:17:18.3132863Z * [new branch] gh/etaf/177/base -> origin/gh/etaf/177/base 2025-12-04T09:17:18.3134967Z * [new branch] gh/etaf/177/head -> origin/gh/etaf/177/head 2025-12-04T09:17:18.3137281Z * [new branch] gh/etaf/177/orig -> origin/gh/etaf/177/orig 2025-12-04T09:17:18.3140189Z * [new branch] gh/etaf/178/base -> origin/gh/etaf/178/base 2025-12-04T09:17:18.3142303Z * [new branch] gh/etaf/178/head -> origin/gh/etaf/178/head 2025-12-04T09:17:18.3144115Z * [new branch] gh/etaf/178/orig -> origin/gh/etaf/178/orig 2025-12-04T09:17:18.3146789Z * [new branch] gh/etaf/179/base -> origin/gh/etaf/179/base 2025-12-04T09:17:18.3148685Z * [new branch] gh/etaf/179/head -> origin/gh/etaf/179/head 2025-12-04T09:17:18.3150554Z * [new branch] gh/etaf/179/orig -> origin/gh/etaf/179/orig 2025-12-04T09:17:18.3153116Z * [new branch] gh/etaf/180/base -> origin/gh/etaf/180/base 2025-12-04T09:17:18.3155269Z * [new branch] gh/etaf/180/head -> origin/gh/etaf/180/head 2025-12-04T09:17:18.3157117Z * [new branch] gh/etaf/180/orig -> origin/gh/etaf/180/orig 2025-12-04T09:17:18.3160614Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-12-04T09:17:18.3162254Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-12-04T09:17:18.3164651Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-12-04T09:17:18.3166418Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-12-04T09:17:18.3168926Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-12-04T09:17:18.3170854Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-12-04T09:17:18.3173434Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-12-04T09:17:18.3175216Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-12-04T09:17:18.3178635Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-12-04T09:17:18.3180616Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-12-04T09:17:18.3182663Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-12-04T09:17:18.3185156Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-12-04T09:17:18.3187108Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-12-04T09:17:18.3188900Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-12-04T09:17:18.3191448Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-12-04T09:17:18.3193276Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-12-04T09:17:18.3195137Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-12-04T09:17:18.3197628Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-12-04T09:17:18.3199608Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-12-04T09:17:18.3201456Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-12-04T09:17:18.3204005Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-12-04T09:17:18.3205838Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-12-04T09:17:18.3208019Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-12-04T09:17:18.3210819Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-12-04T09:17:18.3212587Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-12-04T09:17:18.3214458Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-12-04T09:17:18.3217049Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-12-04T09:17:18.3218878Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-12-04T09:17:18.3221017Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-12-04T09:17:18.3223684Z * [new branch] gh/ezyang/3144/base -> origin/gh/ezyang/3144/base 2025-12-04T09:17:18.3225564Z * [new branch] gh/ezyang/3144/head -> origin/gh/ezyang/3144/head 2025-12-04T09:17:18.3227411Z * [new branch] gh/ezyang/3144/orig -> origin/gh/ezyang/3144/orig 2025-12-04T09:17:18.3230383Z * [new branch] gh/ezyang/3167/base -> origin/gh/ezyang/3167/base 2025-12-04T09:17:18.3231821Z * [new branch] gh/ezyang/3167/head -> origin/gh/ezyang/3167/head 2025-12-04T09:17:18.3234576Z * [new branch] gh/ezyang/3167/orig -> origin/gh/ezyang/3167/orig 2025-12-04T09:17:18.3237114Z * [new branch] gh/ezyang/3173/base -> origin/gh/ezyang/3173/base 2025-12-04T09:17:18.3239012Z * [new branch] gh/ezyang/3173/head -> origin/gh/ezyang/3173/head 2025-12-04T09:17:18.3240878Z * [new branch] gh/ezyang/3173/orig -> origin/gh/ezyang/3173/orig 2025-12-04T09:17:18.3243500Z * [new branch] gh/ezyang/3175/base -> origin/gh/ezyang/3175/base 2025-12-04T09:17:18.3246094Z * [new branch] gh/ezyang/3175/head -> origin/gh/ezyang/3175/head 2025-12-04T09:17:18.3247911Z * [new branch] gh/ezyang/3175/orig -> origin/gh/ezyang/3175/orig 2025-12-04T09:17:18.3250501Z * [new branch] gh/ezyang/3182/base -> origin/gh/ezyang/3182/base 2025-12-04T09:17:18.3252386Z * [new branch] gh/ezyang/3182/head -> origin/gh/ezyang/3182/head 2025-12-04T09:17:18.3254324Z * [new branch] gh/ezyang/3182/orig -> origin/gh/ezyang/3182/orig 2025-12-04T09:17:18.3256892Z * [new branch] gh/ezyang/3185/base -> origin/gh/ezyang/3185/base 2025-12-04T09:17:18.3258909Z * [new branch] gh/ezyang/3185/head -> origin/gh/ezyang/3185/head 2025-12-04T09:17:18.3260788Z * [new branch] gh/ezyang/3185/orig -> origin/gh/ezyang/3185/orig 2025-12-04T09:17:18.3263421Z * [new branch] gh/ezyang/3189/base -> origin/gh/ezyang/3189/base 2025-12-04T09:17:18.3265249Z * [new branch] gh/ezyang/3189/head -> origin/gh/ezyang/3189/head 2025-12-04T09:17:18.3267169Z * [new branch] gh/ezyang/3189/orig -> origin/gh/ezyang/3189/orig 2025-12-04T09:17:18.3269753Z * [new branch] gh/ezyang/3191/base -> origin/gh/ezyang/3191/base 2025-12-04T09:17:18.3271624Z * [new branch] gh/ezyang/3191/head -> origin/gh/ezyang/3191/head 2025-12-04T09:17:18.3273522Z * [new branch] gh/ezyang/3191/orig -> origin/gh/ezyang/3191/orig 2025-12-04T09:17:18.3276792Z * [new branch] gh/ezyang/3192/base -> origin/gh/ezyang/3192/base 2025-12-04T09:17:18.3278676Z * [new branch] gh/ezyang/3192/head -> origin/gh/ezyang/3192/head 2025-12-04T09:17:18.3280602Z * [new branch] gh/ezyang/3192/orig -> origin/gh/ezyang/3192/orig 2025-12-04T09:17:18.3283232Z * [new branch] gh/ezyang/3193/base -> origin/gh/ezyang/3193/base 2025-12-04T09:17:18.3285156Z * [new branch] gh/ezyang/3193/head -> origin/gh/ezyang/3193/head 2025-12-04T09:17:18.3287021Z * [new branch] gh/ezyang/3193/orig -> origin/gh/ezyang/3193/orig 2025-12-04T09:17:18.3289848Z * [new branch] gh/ezyang/3194/base -> origin/gh/ezyang/3194/base 2025-12-04T09:17:18.3291734Z * [new branch] gh/ezyang/3194/head -> origin/gh/ezyang/3194/head 2025-12-04T09:17:18.3293615Z * [new branch] gh/ezyang/3194/orig -> origin/gh/ezyang/3194/orig 2025-12-04T09:17:18.3296215Z * [new branch] gh/ezyang/3195/base -> origin/gh/ezyang/3195/base 2025-12-04T09:17:18.3298125Z * [new branch] gh/ezyang/3195/head -> origin/gh/ezyang/3195/head 2025-12-04T09:17:18.3300044Z * [new branch] gh/ezyang/3195/orig -> origin/gh/ezyang/3195/orig 2025-12-04T09:17:18.3302746Z * [new branch] gh/ezyang/3196/base -> origin/gh/ezyang/3196/base 2025-12-04T09:17:18.3304642Z * [new branch] gh/ezyang/3196/head -> origin/gh/ezyang/3196/head 2025-12-04T09:17:18.3306610Z * [new branch] gh/ezyang/3196/orig -> origin/gh/ezyang/3196/orig 2025-12-04T09:17:18.3309778Z * [new branch] gh/ezyang/3197/base -> origin/gh/ezyang/3197/base 2025-12-04T09:17:18.3311541Z * [new branch] gh/ezyang/3197/head -> origin/gh/ezyang/3197/head 2025-12-04T09:17:18.3313429Z * [new branch] gh/ezyang/3197/orig -> origin/gh/ezyang/3197/orig 2025-12-04T09:17:18.3316258Z * [new branch] gh/ezyang/3198/base -> origin/gh/ezyang/3198/base 2025-12-04T09:17:18.3318171Z * [new branch] gh/ezyang/3198/head -> origin/gh/ezyang/3198/head 2025-12-04T09:17:18.3320072Z * [new branch] gh/ezyang/3198/orig -> origin/gh/ezyang/3198/orig 2025-12-04T09:17:18.3322740Z * [new branch] gh/ezyang/3199/base -> origin/gh/ezyang/3199/base 2025-12-04T09:17:18.3324538Z * [new branch] gh/ezyang/3199/head -> origin/gh/ezyang/3199/head 2025-12-04T09:17:18.3326504Z * [new branch] gh/ezyang/3199/orig -> origin/gh/ezyang/3199/orig 2025-12-04T09:17:18.3329101Z * [new branch] gh/ezyang/3200/base -> origin/gh/ezyang/3200/base 2025-12-04T09:17:18.3330958Z * [new branch] gh/ezyang/3200/head -> origin/gh/ezyang/3200/head 2025-12-04T09:17:18.3332842Z * [new branch] gh/ezyang/3200/orig -> origin/gh/ezyang/3200/orig 2025-12-04T09:17:18.3335477Z * [new branch] gh/ezyang/3201/base -> origin/gh/ezyang/3201/base 2025-12-04T09:17:18.3337557Z * [new branch] gh/ezyang/3201/head -> origin/gh/ezyang/3201/head 2025-12-04T09:17:18.3339287Z * [new branch] gh/ezyang/3201/orig -> origin/gh/ezyang/3201/orig 2025-12-04T09:17:18.3342181Z * [new branch] gh/ezyang/3202/base -> origin/gh/ezyang/3202/base 2025-12-04T09:17:18.3344024Z * [new branch] gh/ezyang/3202/head -> origin/gh/ezyang/3202/head 2025-12-04T09:17:18.3345939Z * [new branch] gh/ezyang/3202/orig -> origin/gh/ezyang/3202/orig 2025-12-04T09:17:18.3348605Z * [new branch] gh/ezyang/3203/base -> origin/gh/ezyang/3203/base 2025-12-04T09:17:18.3350453Z * [new branch] gh/ezyang/3203/head -> origin/gh/ezyang/3203/head 2025-12-04T09:17:18.3352513Z * [new branch] gh/ezyang/3203/orig -> origin/gh/ezyang/3203/orig 2025-12-04T09:17:18.3355147Z * [new branch] gh/ezyang/3204/base -> origin/gh/ezyang/3204/base 2025-12-04T09:17:18.3357074Z * [new branch] gh/ezyang/3204/head -> origin/gh/ezyang/3204/head 2025-12-04T09:17:18.3358994Z * [new branch] gh/ezyang/3204/orig -> origin/gh/ezyang/3204/orig 2025-12-04T09:17:18.3361607Z * [new branch] gh/ezyang/3205/base -> origin/gh/ezyang/3205/base 2025-12-04T09:17:18.3363432Z * [new branch] gh/ezyang/3205/head -> origin/gh/ezyang/3205/head 2025-12-04T09:17:18.3365287Z * [new branch] gh/ezyang/3205/orig -> origin/gh/ezyang/3205/orig 2025-12-04T09:17:18.3368065Z * [new branch] gh/ezyang/3206/base -> origin/gh/ezyang/3206/base 2025-12-04T09:17:18.3369959Z * [new branch] gh/ezyang/3206/head -> origin/gh/ezyang/3206/head 2025-12-04T09:17:18.3371814Z * [new branch] gh/ezyang/3206/orig -> origin/gh/ezyang/3206/orig 2025-12-04T09:17:18.3374503Z * [new branch] gh/ezyang/3207/base -> origin/gh/ezyang/3207/base 2025-12-04T09:17:18.3376276Z * [new branch] gh/ezyang/3207/head -> origin/gh/ezyang/3207/head 2025-12-04T09:17:18.3378170Z * [new branch] gh/ezyang/3207/orig -> origin/gh/ezyang/3207/orig 2025-12-04T09:17:18.3381028Z * [new branch] gh/ezyang/3208/base -> origin/gh/ezyang/3208/base 2025-12-04T09:17:18.3382855Z * [new branch] gh/ezyang/3208/head -> origin/gh/ezyang/3208/head 2025-12-04T09:17:18.3384791Z * [new branch] gh/ezyang/3208/orig -> origin/gh/ezyang/3208/orig 2025-12-04T09:17:18.3387459Z * [new branch] gh/ezyang/3209/base -> origin/gh/ezyang/3209/base 2025-12-04T09:17:18.3389411Z * [new branch] gh/ezyang/3209/head -> origin/gh/ezyang/3209/head 2025-12-04T09:17:18.3391258Z * [new branch] gh/ezyang/3209/orig -> origin/gh/ezyang/3209/orig 2025-12-04T09:17:18.3394509Z * [new branch] gh/fadara01/3/base -> origin/gh/fadara01/3/base 2025-12-04T09:17:18.3396334Z * [new branch] gh/fadara01/3/head -> origin/gh/fadara01/3/head 2025-12-04T09:17:18.3398193Z * [new branch] gh/fadara01/3/orig -> origin/gh/fadara01/3/orig 2025-12-04T09:17:18.3400785Z * [new branch] gh/fadara01/5/base -> origin/gh/fadara01/5/base 2025-12-04T09:17:18.3402658Z * [new branch] gh/fadara01/5/head -> origin/gh/fadara01/5/head 2025-12-04T09:17:18.3404561Z * [new branch] gh/fadara01/5/orig -> origin/gh/fadara01/5/orig 2025-12-04T09:17:18.3407059Z * [new branch] gh/fadara01/6/base -> origin/gh/fadara01/6/base 2025-12-04T09:17:18.3409349Z * [new branch] gh/fadara01/6/head -> origin/gh/fadara01/6/head 2025-12-04T09:17:18.3411203Z * [new branch] gh/fadara01/6/orig -> origin/gh/fadara01/6/orig 2025-12-04T09:17:18.3413888Z * [new branch] gh/fadara01/7/base -> origin/gh/fadara01/7/base 2025-12-04T09:17:18.3415661Z * [new branch] gh/fadara01/7/head -> origin/gh/fadara01/7/head 2025-12-04T09:17:18.3417626Z * [new branch] gh/fadara01/7/orig -> origin/gh/fadara01/7/orig 2025-12-04T09:17:18.3420534Z * [new branch] gh/fadara01/8/base -> origin/gh/fadara01/8/base 2025-12-04T09:17:18.3422328Z * [new branch] gh/fadara01/8/head -> origin/gh/fadara01/8/head 2025-12-04T09:17:18.3424203Z * [new branch] gh/fadara01/8/orig -> origin/gh/fadara01/8/orig 2025-12-04T09:17:18.3426732Z * [new branch] gh/fadara01/9/base -> origin/gh/fadara01/9/base 2025-12-04T09:17:18.3428558Z * [new branch] gh/fadara01/9/head -> origin/gh/fadara01/9/head 2025-12-04T09:17:18.3430429Z * [new branch] gh/fadara01/9/orig -> origin/gh/fadara01/9/orig 2025-12-04T09:17:18.3433495Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-12-04T09:17:18.3435310Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-12-04T09:17:18.3437132Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-12-04T09:17:18.3439683Z * [new branch] gh/fduwjj/211/base -> origin/gh/fduwjj/211/base 2025-12-04T09:17:18.3441713Z * [new branch] gh/fduwjj/211/head -> origin/gh/fduwjj/211/head 2025-12-04T09:17:18.3443556Z * [new branch] gh/fduwjj/211/orig -> origin/gh/fduwjj/211/orig 2025-12-04T09:17:18.3446061Z * [new branch] gh/fduwjj/212/base -> origin/gh/fduwjj/212/base 2025-12-04T09:17:18.3447903Z * [new branch] gh/fduwjj/212/head -> origin/gh/fduwjj/212/head 2025-12-04T09:17:18.3449699Z * [new branch] gh/fduwjj/212/orig -> origin/gh/fduwjj/212/orig 2025-12-04T09:17:18.3452175Z * [new branch] gh/fduwjj/213/base -> origin/gh/fduwjj/213/base 2025-12-04T09:17:18.3454021Z * [new branch] gh/fduwjj/213/head -> origin/gh/fduwjj/213/head 2025-12-04T09:17:18.3455890Z * [new branch] gh/fduwjj/213/orig -> origin/gh/fduwjj/213/orig 2025-12-04T09:17:18.3458447Z * [new branch] gh/fduwjj/226/base -> origin/gh/fduwjj/226/base 2025-12-04T09:17:18.3460419Z * [new branch] gh/fduwjj/226/head -> origin/gh/fduwjj/226/head 2025-12-04T09:17:18.3462275Z * [new branch] gh/fduwjj/226/orig -> origin/gh/fduwjj/226/orig 2025-12-04T09:17:18.3465050Z * [new branch] gh/fduwjj/229/base -> origin/gh/fduwjj/229/base 2025-12-04T09:17:18.3466938Z * [new branch] gh/fduwjj/229/head -> origin/gh/fduwjj/229/head 2025-12-04T09:17:18.3468837Z * [new branch] gh/fduwjj/229/orig -> origin/gh/fduwjj/229/orig 2025-12-04T09:17:18.3472019Z * [new branch] gh/fduwjj/233/base -> origin/gh/fduwjj/233/base 2025-12-04T09:17:18.3473836Z * [new branch] gh/fduwjj/233/head -> origin/gh/fduwjj/233/head 2025-12-04T09:17:18.3475722Z * [new branch] gh/fduwjj/233/orig -> origin/gh/fduwjj/233/orig 2025-12-04T09:17:18.3478287Z * [new branch] gh/fduwjj/234/base -> origin/gh/fduwjj/234/base 2025-12-04T09:17:18.3480155Z * [new branch] gh/fduwjj/234/head -> origin/gh/fduwjj/234/head 2025-12-04T09:17:18.3482033Z * [new branch] gh/fduwjj/234/orig -> origin/gh/fduwjj/234/orig 2025-12-04T09:17:18.3484810Z * [new branch] gh/fduwjj/235/base -> origin/gh/fduwjj/235/base 2025-12-04T09:17:18.3486809Z * [new branch] gh/fduwjj/235/head -> origin/gh/fduwjj/235/head 2025-12-04T09:17:18.3488627Z * [new branch] gh/fduwjj/235/orig -> origin/gh/fduwjj/235/orig 2025-12-04T09:17:18.3491230Z * [new branch] gh/fduwjj/236/base -> origin/gh/fduwjj/236/base 2025-12-04T09:17:18.3493043Z * [new branch] gh/fduwjj/236/head -> origin/gh/fduwjj/236/head 2025-12-04T09:17:18.3494955Z * [new branch] gh/fduwjj/236/orig -> origin/gh/fduwjj/236/orig 2025-12-04T09:17:18.3497353Z * [new branch] gh/fduwjj/237/base -> origin/gh/fduwjj/237/base 2025-12-04T09:17:18.3499226Z * [new branch] gh/fduwjj/237/head -> origin/gh/fduwjj/237/head 2025-12-04T09:17:18.3501229Z * [new branch] gh/fduwjj/237/orig -> origin/gh/fduwjj/237/orig 2025-12-04T09:17:18.3503697Z * [new branch] gh/fduwjj/238/base -> origin/gh/fduwjj/238/base 2025-12-04T09:17:18.3505650Z * [new branch] gh/fduwjj/238/head -> origin/gh/fduwjj/238/head 2025-12-04T09:17:18.3507512Z * [new branch] gh/fduwjj/238/orig -> origin/gh/fduwjj/238/orig 2025-12-04T09:17:18.3513205Z * [new branch] gh/fduwjj/239/base -> origin/gh/fduwjj/239/base 2025-12-04T09:17:18.3515118Z * [new branch] gh/fduwjj/239/head -> origin/gh/fduwjj/239/head 2025-12-04T09:17:18.3516932Z * [new branch] gh/fduwjj/239/orig -> origin/gh/fduwjj/239/orig 2025-12-04T09:17:18.3520631Z * [new branch] gh/fegin/332/base -> origin/gh/fegin/332/base 2025-12-04T09:17:18.3522465Z * [new branch] gh/fegin/332/head -> origin/gh/fegin/332/head 2025-12-04T09:17:18.3524323Z * [new branch] gh/fegin/332/orig -> origin/gh/fegin/332/orig 2025-12-04T09:17:18.3526790Z * [new branch] gh/fegin/333/base -> origin/gh/fegin/333/base 2025-12-04T09:17:18.3528648Z * [new branch] gh/fegin/333/head -> origin/gh/fegin/333/head 2025-12-04T09:17:18.3530523Z * [new branch] gh/fegin/333/orig -> origin/gh/fegin/333/orig 2025-12-04T09:17:18.3533048Z * [new branch] gh/fegin/334/base -> origin/gh/fegin/334/base 2025-12-04T09:17:18.3534884Z * [new branch] gh/fegin/334/head -> origin/gh/fegin/334/head 2025-12-04T09:17:18.3536825Z * [new branch] gh/fegin/334/orig -> origin/gh/fegin/334/orig 2025-12-04T09:17:18.3539356Z * [new branch] gh/fegin/335/base -> origin/gh/fegin/335/base 2025-12-04T09:17:18.3541358Z * [new branch] gh/fegin/335/head -> origin/gh/fegin/335/head 2025-12-04T09:17:18.3543161Z * [new branch] gh/fegin/335/orig -> origin/gh/fegin/335/orig 2025-12-04T09:17:18.3546263Z * [new branch] gh/fffrog/160/base -> origin/gh/fffrog/160/base 2025-12-04T09:17:18.3548139Z * [new branch] gh/fffrog/160/head -> origin/gh/fffrog/160/head 2025-12-04T09:17:18.3550634Z * [new branch] gh/fffrog/177/base -> origin/gh/fffrog/177/base 2025-12-04T09:17:18.3552405Z * [new branch] gh/fffrog/177/head -> origin/gh/fffrog/177/head 2025-12-04T09:17:18.3554353Z * [new branch] gh/fffrog/177/orig -> origin/gh/fffrog/177/orig 2025-12-04T09:17:18.3557349Z * [new branch] gh/fffrog/178/base -> origin/gh/fffrog/178/base 2025-12-04T09:17:18.3559128Z * [new branch] gh/fffrog/178/head -> origin/gh/fffrog/178/head 2025-12-04T09:17:18.3561099Z * [new branch] gh/fffrog/178/orig -> origin/gh/fffrog/178/orig 2025-12-04T09:17:18.3563615Z * [new branch] gh/fffrog/181/base -> origin/gh/fffrog/181/base 2025-12-04T09:17:18.3565467Z * [new branch] gh/fffrog/181/head -> origin/gh/fffrog/181/head 2025-12-04T09:17:18.3567357Z * [new branch] gh/fffrog/181/orig -> origin/gh/fffrog/181/orig 2025-12-04T09:17:18.3570148Z * [new branch] gh/fffrog/183/base -> origin/gh/fffrog/183/base 2025-12-04T09:17:18.3571846Z * [new branch] gh/fffrog/183/head -> origin/gh/fffrog/183/head 2025-12-04T09:17:18.3573553Z * [new branch] gh/fffrog/183/orig -> origin/gh/fffrog/183/orig 2025-12-04T09:17:18.3576723Z * [new branch] gh/fxdawnn/10/base -> origin/gh/fxdawnn/10/base 2025-12-04T09:17:18.3578868Z * [new branch] gh/fxdawnn/10/head -> origin/gh/fxdawnn/10/head 2025-12-04T09:17:18.3581141Z * [new branch] gh/fxdawnn/10/orig -> origin/gh/fxdawnn/10/orig 2025-12-04T09:17:18.3584415Z * [new branch] gh/fxdawnn/11/base -> origin/gh/fxdawnn/11/base 2025-12-04T09:17:18.3586783Z * [new branch] gh/fxdawnn/11/head -> origin/gh/fxdawnn/11/head 2025-12-04T09:17:18.3589574Z * [new branch] gh/fxdawnn/11/orig -> origin/gh/fxdawnn/11/orig 2025-12-04T09:17:18.3591872Z * [new branch] gh/fxdawnn/12/base -> origin/gh/fxdawnn/12/base 2025-12-04T09:17:18.3593781Z * [new branch] gh/fxdawnn/12/head -> origin/gh/fxdawnn/12/head 2025-12-04T09:17:18.3595610Z * [new branch] gh/fxdawnn/12/orig -> origin/gh/fxdawnn/12/orig 2025-12-04T09:17:18.3598204Z * [new branch] gh/fxdawnn/13/base -> origin/gh/fxdawnn/13/base 2025-12-04T09:17:18.3600108Z * [new branch] gh/fxdawnn/13/head -> origin/gh/fxdawnn/13/head 2025-12-04T09:17:18.3601936Z * [new branch] gh/fxdawnn/13/orig -> origin/gh/fxdawnn/13/orig 2025-12-04T09:17:18.3604563Z * [new branch] gh/fxdawnn/14/base -> origin/gh/fxdawnn/14/base 2025-12-04T09:17:18.3606373Z * [new branch] gh/fxdawnn/14/head -> origin/gh/fxdawnn/14/head 2025-12-04T09:17:18.3608470Z * [new branch] gh/fxdawnn/14/orig -> origin/gh/fxdawnn/14/orig 2025-12-04T09:17:18.3610950Z * [new branch] gh/fxdawnn/15/base -> origin/gh/fxdawnn/15/base 2025-12-04T09:17:18.3612796Z * [new branch] gh/fxdawnn/15/head -> origin/gh/fxdawnn/15/head 2025-12-04T09:17:18.3614514Z * [new branch] gh/fxdawnn/15/orig -> origin/gh/fxdawnn/15/orig 2025-12-04T09:17:18.3617022Z * [new branch] gh/fxdawnn/6/base -> origin/gh/fxdawnn/6/base 2025-12-04T09:17:18.3618852Z * [new branch] gh/fxdawnn/6/head -> origin/gh/fxdawnn/6/head 2025-12-04T09:17:18.3621003Z * [new branch] gh/fxdawnn/6/orig -> origin/gh/fxdawnn/6/orig 2025-12-04T09:17:18.3623594Z * [new branch] gh/fxdawnn/7/base -> origin/gh/fxdawnn/7/base 2025-12-04T09:17:18.3625535Z * [new branch] gh/fxdawnn/7/head -> origin/gh/fxdawnn/7/head 2025-12-04T09:17:18.3627212Z * [new branch] gh/fxdawnn/7/orig -> origin/gh/fxdawnn/7/orig 2025-12-04T09:17:18.3630185Z * [new branch] gh/fxdawnn/9/base -> origin/gh/fxdawnn/9/base 2025-12-04T09:17:18.3631595Z * [new branch] gh/fxdawnn/9/head -> origin/gh/fxdawnn/9/head 2025-12-04T09:17:18.3633438Z * [new branch] gh/fxdawnn/9/orig -> origin/gh/fxdawnn/9/orig 2025-12-04T09:17:18.3636504Z * [new branch] gh/galv/1/base -> origin/gh/galv/1/base 2025-12-04T09:17:18.3638396Z * [new branch] gh/galv/1/head -> origin/gh/galv/1/head 2025-12-04T09:17:18.3640293Z * [new branch] gh/galv/1/orig -> origin/gh/galv/1/orig 2025-12-04T09:17:18.3642790Z * [new branch] gh/galv/2/base -> origin/gh/galv/2/base 2025-12-04T09:17:18.3644694Z * [new branch] gh/galv/2/head -> origin/gh/galv/2/head 2025-12-04T09:17:18.3646716Z * [new branch] gh/galv/2/orig -> origin/gh/galv/2/orig 2025-12-04T09:17:18.3649338Z * [new branch] gh/galv/3/base -> origin/gh/galv/3/base 2025-12-04T09:17:18.3651035Z * [new branch] gh/galv/3/head -> origin/gh/galv/3/head 2025-12-04T09:17:18.3652988Z * [new branch] gh/galv/3/orig -> origin/gh/galv/3/orig 2025-12-04T09:17:18.3656038Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-12-04T09:17:18.3657923Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-12-04T09:17:18.3659836Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-12-04T09:17:18.3662421Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-12-04T09:17:18.3664252Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-12-04T09:17:18.3666138Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-12-04T09:17:18.3668772Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-12-04T09:17:18.3670714Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-12-04T09:17:18.3672546Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-12-04T09:17:18.3675662Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-12-04T09:17:18.3677556Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-12-04T09:17:18.3679430Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-12-04T09:17:18.3681994Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-12-04T09:17:18.3683882Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-12-04T09:17:18.3685692Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-12-04T09:17:18.3688212Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-12-04T09:17:18.3690114Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-12-04T09:17:18.3691995Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-12-04T09:17:18.3694524Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-12-04T09:17:18.3696668Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-12-04T09:17:18.3698535Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-12-04T09:17:18.3701317Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-12-04T09:17:18.3703195Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-12-04T09:17:18.3705050Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-12-04T09:17:18.3707550Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-12-04T09:17:18.3709698Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-12-04T09:17:18.3711610Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-12-04T09:17:18.3714139Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-12-04T09:17:18.3715954Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-12-04T09:17:18.3717846Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-12-04T09:17:18.3720500Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-12-04T09:17:18.3722418Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-12-04T09:17:18.3724347Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-12-04T09:17:18.3726882Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-12-04T09:17:18.3728711Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-12-04T09:17:18.3730588Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-12-04T09:17:18.3733267Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-12-04T09:17:18.3736225Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-12-04T09:17:18.3738199Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-12-04T09:17:18.3740605Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-12-04T09:17:18.3742703Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-12-04T09:17:18.3745080Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-12-04T09:17:18.3748039Z * [new branch] gh/guangyey/208/base -> origin/gh/guangyey/208/base 2025-12-04T09:17:18.3748915Z * [new branch] gh/guangyey/208/head -> origin/gh/guangyey/208/head 2025-12-04T09:17:18.3751096Z * [new branch] gh/guangyey/208/orig -> origin/gh/guangyey/208/orig 2025-12-04T09:17:18.3753507Z * [new branch] gh/guangyey/228/base -> origin/gh/guangyey/228/base 2025-12-04T09:17:18.3755386Z * [new branch] gh/guangyey/228/head -> origin/gh/guangyey/228/head 2025-12-04T09:17:18.3757263Z * [new branch] gh/guangyey/228/orig -> origin/gh/guangyey/228/orig 2025-12-04T09:17:18.3760484Z * [new branch] gh/guangyey/230/base -> origin/gh/guangyey/230/base 2025-12-04T09:17:18.3762336Z * [new branch] gh/guangyey/230/head -> origin/gh/guangyey/230/head 2025-12-04T09:17:18.3764125Z * [new branch] gh/guangyey/230/orig -> origin/gh/guangyey/230/orig 2025-12-04T09:17:18.3766813Z * [new branch] gh/guangyey/231/base -> origin/gh/guangyey/231/base 2025-12-04T09:17:18.3768640Z * [new branch] gh/guangyey/231/head -> origin/gh/guangyey/231/head 2025-12-04T09:17:18.3770591Z * [new branch] gh/guangyey/231/orig -> origin/gh/guangyey/231/orig 2025-12-04T09:17:18.3773200Z * [new branch] gh/guangyey/232/base -> origin/gh/guangyey/232/base 2025-12-04T09:17:18.3775023Z * [new branch] gh/guangyey/232/head -> origin/gh/guangyey/232/head 2025-12-04T09:17:18.3776813Z * [new branch] gh/guangyey/232/orig -> origin/gh/guangyey/232/orig 2025-12-04T09:17:18.3779524Z * [new branch] gh/guangyey/233/base -> origin/gh/guangyey/233/base 2025-12-04T09:17:18.3781393Z * [new branch] gh/guangyey/233/head -> origin/gh/guangyey/233/head 2025-12-04T09:17:18.3783196Z * [new branch] gh/guangyey/233/orig -> origin/gh/guangyey/233/orig 2025-12-04T09:17:18.3785747Z * [new branch] gh/guangyey/234/base -> origin/gh/guangyey/234/base 2025-12-04T09:17:18.3787626Z * [new branch] gh/guangyey/234/head -> origin/gh/guangyey/234/head 2025-12-04T09:17:18.3789443Z * [new branch] gh/guangyey/234/orig -> origin/gh/guangyey/234/orig 2025-12-04T09:17:18.3792068Z * [new branch] gh/guangyey/235/base -> origin/gh/guangyey/235/base 2025-12-04T09:17:18.3793861Z * [new branch] gh/guangyey/235/head -> origin/gh/guangyey/235/head 2025-12-04T09:17:18.3795712Z * [new branch] gh/guangyey/235/orig -> origin/gh/guangyey/235/orig 2025-12-04T09:17:18.3798277Z * [new branch] gh/guangyey/236/base -> origin/gh/guangyey/236/base 2025-12-04T09:17:18.3800371Z * [new branch] gh/guangyey/236/head -> origin/gh/guangyey/236/head 2025-12-04T09:17:18.3802146Z * [new branch] gh/guangyey/236/orig -> origin/gh/guangyey/236/orig 2025-12-04T09:17:18.3804720Z * [new branch] gh/guangyey/237/base -> origin/gh/guangyey/237/base 2025-12-04T09:17:18.3806966Z * [new branch] gh/guangyey/237/head -> origin/gh/guangyey/237/head 2025-12-04T09:17:18.3808891Z * [new branch] gh/guangyey/237/orig -> origin/gh/guangyey/237/orig 2025-12-04T09:17:18.3811621Z * [new branch] gh/guangyey/238/base -> origin/gh/guangyey/238/base 2025-12-04T09:17:18.3813431Z * [new branch] gh/guangyey/238/head -> origin/gh/guangyey/238/head 2025-12-04T09:17:18.3816009Z * [new branch] gh/guangyey/239/base -> origin/gh/guangyey/239/base 2025-12-04T09:17:18.3817844Z * [new branch] gh/guangyey/239/head -> origin/gh/guangyey/239/head 2025-12-04T09:17:18.3819748Z * [new branch] gh/guangyey/239/orig -> origin/gh/guangyey/239/orig 2025-12-04T09:17:18.3822367Z * [new branch] gh/guangyey/240/base -> origin/gh/guangyey/240/base 2025-12-04T09:17:18.3824147Z * [new branch] gh/guangyey/240/head -> origin/gh/guangyey/240/head 2025-12-04T09:17:18.3826149Z * [new branch] gh/guangyey/240/orig -> origin/gh/guangyey/240/orig 2025-12-04T09:17:18.3828710Z * [new branch] gh/guangyey/241/base -> origin/gh/guangyey/241/base 2025-12-04T09:17:18.3830569Z * [new branch] gh/guangyey/241/head -> origin/gh/guangyey/241/head 2025-12-04T09:17:18.3832526Z * [new branch] gh/guangyey/241/orig -> origin/gh/guangyey/241/orig 2025-12-04T09:17:18.3835062Z * [new branch] gh/guangyey/242/base -> origin/gh/guangyey/242/base 2025-12-04T09:17:18.3836924Z * [new branch] gh/guangyey/242/head -> origin/gh/guangyey/242/head 2025-12-04T09:17:18.3838751Z * [new branch] gh/guangyey/242/orig -> origin/gh/guangyey/242/orig 2025-12-04T09:17:18.3841409Z * [new branch] gh/guangyey/243/base -> origin/gh/guangyey/243/base 2025-12-04T09:17:18.3843235Z * [new branch] gh/guangyey/243/head -> origin/gh/guangyey/243/head 2025-12-04T09:17:18.3845043Z * [new branch] gh/guangyey/243/orig -> origin/gh/guangyey/243/orig 2025-12-04T09:17:18.3847719Z * [new branch] gh/guangyey/244/base -> origin/gh/guangyey/244/base 2025-12-04T09:17:18.3849548Z * [new branch] gh/guangyey/244/head -> origin/gh/guangyey/244/head 2025-12-04T09:17:18.3851474Z * [new branch] gh/guangyey/244/orig -> origin/gh/guangyey/244/orig 2025-12-04T09:17:18.3854065Z * [new branch] gh/guangyey/245/base -> origin/gh/guangyey/245/base 2025-12-04T09:17:18.3855898Z * [new branch] gh/guangyey/245/head -> origin/gh/guangyey/245/head 2025-12-04T09:17:18.3857732Z * [new branch] gh/guangyey/245/orig -> origin/gh/guangyey/245/orig 2025-12-04T09:17:18.3860808Z * [new branch] gh/guangyey/246/base -> origin/gh/guangyey/246/base 2025-12-04T09:17:18.3862093Z * [new branch] gh/guangyey/246/head -> origin/gh/guangyey/246/head 2025-12-04T09:17:18.3864092Z * [new branch] gh/guangyey/246/orig -> origin/gh/guangyey/246/orig 2025-12-04T09:17:18.3866694Z * [new branch] gh/guangyey/247/base -> origin/gh/guangyey/247/base 2025-12-04T09:17:18.3868559Z * [new branch] gh/guangyey/247/head -> origin/gh/guangyey/247/head 2025-12-04T09:17:18.3870370Z * [new branch] gh/guangyey/247/orig -> origin/gh/guangyey/247/orig 2025-12-04T09:17:18.3873022Z * [new branch] gh/guangyey/248/base -> origin/gh/guangyey/248/base 2025-12-04T09:17:18.3875020Z * [new branch] gh/guangyey/248/head -> origin/gh/guangyey/248/head 2025-12-04T09:17:18.3876810Z * [new branch] gh/guangyey/248/orig -> origin/gh/guangyey/248/orig 2025-12-04T09:17:18.3879337Z * [new branch] gh/guangyey/249/base -> origin/gh/guangyey/249/base 2025-12-04T09:17:18.3881341Z * [new branch] gh/guangyey/249/head -> origin/gh/guangyey/249/head 2025-12-04T09:17:18.3883175Z * [new branch] gh/guangyey/249/orig -> origin/gh/guangyey/249/orig 2025-12-04T09:17:18.3886330Z * [new branch] gh/guangyey/250/base -> origin/gh/guangyey/250/base 2025-12-04T09:17:18.3888209Z * [new branch] gh/guangyey/250/head -> origin/gh/guangyey/250/head 2025-12-04T09:17:18.3890027Z * [new branch] gh/guangyey/250/orig -> origin/gh/guangyey/250/orig 2025-12-04T09:17:18.3892517Z * [new branch] gh/guangyey/251/base -> origin/gh/guangyey/251/base 2025-12-04T09:17:18.3894356Z * [new branch] gh/guangyey/251/head -> origin/gh/guangyey/251/head 2025-12-04T09:17:18.3896155Z * [new branch] gh/guangyey/251/orig -> origin/gh/guangyey/251/orig 2025-12-04T09:17:18.3898838Z * [new branch] gh/guangyey/252/base -> origin/gh/guangyey/252/base 2025-12-04T09:17:18.3900765Z * [new branch] gh/guangyey/252/head -> origin/gh/guangyey/252/head 2025-12-04T09:17:18.3902700Z * [new branch] gh/guangyey/252/orig -> origin/gh/guangyey/252/orig 2025-12-04T09:17:18.3905249Z * [new branch] gh/guangyey/253/base -> origin/gh/guangyey/253/base 2025-12-04T09:17:18.3907041Z * [new branch] gh/guangyey/253/head -> origin/gh/guangyey/253/head 2025-12-04T09:17:18.3908878Z * [new branch] gh/guangyey/253/orig -> origin/gh/guangyey/253/orig 2025-12-04T09:17:18.3913716Z * [new branch] gh/guangyey/254/base -> origin/gh/guangyey/254/base 2025-12-04T09:17:18.3915582Z * [new branch] gh/guangyey/254/head -> origin/gh/guangyey/254/head 2025-12-04T09:17:18.3917374Z * [new branch] gh/guangyey/254/orig -> origin/gh/guangyey/254/orig 2025-12-04T09:17:18.3923786Z * [new branch] gh/guangyey/255/base -> origin/gh/guangyey/255/base 2025-12-04T09:17:18.3924603Z * [new branch] gh/guangyey/255/head -> origin/gh/guangyey/255/head 2025-12-04T09:17:18.3925143Z * [new branch] gh/guangyey/255/orig -> origin/gh/guangyey/255/orig 2025-12-04T09:17:18.3926939Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-12-04T09:17:18.3929245Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-12-04T09:17:18.3930520Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-12-04T09:17:18.3933133Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-12-04T09:17:18.3934955Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-12-04T09:17:18.3937009Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-12-04T09:17:18.3940194Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-12-04T09:17:18.3944413Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-12-04T09:17:18.3947816Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-12-04T09:17:18.3950856Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-12-04T09:17:18.3953291Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-12-04T09:17:18.3955969Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-12-04T09:17:18.3958576Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-12-04T09:17:18.3960363Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-12-04T09:17:18.3962222Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-12-04T09:17:18.3964873Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-12-04T09:17:18.3966772Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-12-04T09:17:18.3969134Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-12-04T09:17:18.3971755Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-12-04T09:17:18.3973627Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-12-04T09:17:18.3975525Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-12-04T09:17:18.3978046Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-12-04T09:17:18.3980143Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-12-04T09:17:18.3981947Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-12-04T09:17:18.3984535Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-12-04T09:17:18.3986714Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-12-04T09:17:18.3988745Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-12-04T09:17:18.3991309Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-12-04T09:17:18.3993182Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-12-04T09:17:18.3995069Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-12-04T09:17:18.3997582Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-12-04T09:17:18.3999441Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-12-04T09:17:18.4001303Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-12-04T09:17:18.4003812Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-12-04T09:17:18.4005607Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-12-04T09:17:18.4007437Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-12-04T09:17:18.4010307Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-12-04T09:17:18.4012144Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-12-04T09:17:18.4013928Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-12-04T09:17:18.4016423Z * [new branch] gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base 2025-12-04T09:17:18.4018244Z * [new branch] gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head 2025-12-04T09:17:18.4020214Z * [new branch] gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig 2025-12-04T09:17:18.4023216Z * [new branch] gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base 2025-12-04T09:17:18.4025125Z * [new branch] gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head 2025-12-04T09:17:18.4026945Z * [new branch] gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig 2025-12-04T09:17:18.4029573Z * [new branch] gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base 2025-12-04T09:17:18.4031146Z * [new branch] gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head 2025-12-04T09:17:18.4033154Z * [new branch] gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig 2025-12-04T09:17:18.4036259Z * [new branch] gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base 2025-12-04T09:17:18.4038168Z * [new branch] gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head 2025-12-04T09:17:18.4040108Z * [new branch] gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig 2025-12-04T09:17:18.4042770Z * [new branch] gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base 2025-12-04T09:17:18.4044575Z * [new branch] gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head 2025-12-04T09:17:18.4046456Z * [new branch] gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig 2025-12-04T09:17:18.4048988Z * [new branch] gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base 2025-12-04T09:17:18.4050848Z * [new branch] gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head 2025-12-04T09:17:18.4052932Z * [new branch] gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig 2025-12-04T09:17:18.4055333Z * [new branch] gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base 2025-12-04T09:17:18.4057415Z * [new branch] gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head 2025-12-04T09:17:18.4058793Z * [new branch] gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig 2025-12-04T09:17:18.4061733Z * [new branch] gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base 2025-12-04T09:17:18.4063701Z * [new branch] gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head 2025-12-04T09:17:18.4065657Z * [new branch] gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig 2025-12-04T09:17:18.4068213Z * [new branch] gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base 2025-12-04T09:17:18.4069992Z * [new branch] gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head 2025-12-04T09:17:18.4071949Z * [new branch] gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig 2025-12-04T09:17:18.4074477Z * [new branch] gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base 2025-12-04T09:17:18.4076295Z * [new branch] gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head 2025-12-04T09:17:18.4078101Z * [new branch] gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig 2025-12-04T09:17:18.4080698Z * [new branch] gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base 2025-12-04T09:17:18.4084827Z * [new branch] gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head 2025-12-04T09:17:18.4085655Z * [new branch] gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig 2025-12-04T09:17:18.4087613Z * [new branch] gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base 2025-12-04T09:17:18.4089359Z * [new branch] gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head 2025-12-04T09:17:18.4091172Z * [new branch] gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig 2025-12-04T09:17:18.4093754Z * [new branch] gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base 2025-12-04T09:17:18.4095666Z * [new branch] gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head 2025-12-04T09:17:18.4098029Z * [new branch] gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig 2025-12-04T09:17:18.4101379Z * [new branch] gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base 2025-12-04T09:17:18.4102415Z * [new branch] gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head 2025-12-04T09:17:18.4104612Z * [new branch] gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig 2025-12-04T09:17:18.4107218Z * [new branch] gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base 2025-12-04T09:17:18.4109070Z * [new branch] gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head 2025-12-04T09:17:18.4111049Z * [new branch] gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig 2025-12-04T09:17:18.4113626Z * [new branch] gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base 2025-12-04T09:17:18.4115582Z * [new branch] gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head 2025-12-04T09:17:18.4117478Z * [new branch] gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig 2025-12-04T09:17:18.4120112Z * [new branch] gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base 2025-12-04T09:17:18.4122036Z * [new branch] gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head 2025-12-04T09:17:18.4123915Z * [new branch] gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig 2025-12-04T09:17:18.4126637Z * [new branch] gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base 2025-12-04T09:17:18.4128443Z * [new branch] gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head 2025-12-04T09:17:18.4130866Z * [new branch] gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig 2025-12-04T09:17:18.4133964Z * [new branch] gh/hameerabbasi/1/base -> origin/gh/hameerabbasi/1/base 2025-12-04T09:17:18.4135829Z * [new branch] gh/hameerabbasi/1/head -> origin/gh/hameerabbasi/1/head 2025-12-04T09:17:18.4138283Z * [new branch] gh/hameerabbasi/2/base -> origin/gh/hameerabbasi/2/base 2025-12-04T09:17:18.4140393Z * [new branch] gh/hameerabbasi/2/head -> origin/gh/hameerabbasi/2/head 2025-12-04T09:17:18.4142269Z * [new branch] gh/hameerabbasi/2/orig -> origin/gh/hameerabbasi/2/orig 2025-12-04T09:17:18.4145229Z * [new branch] gh/hameerabbasi/3/base -> origin/gh/hameerabbasi/3/base 2025-12-04T09:17:18.4147104Z * [new branch] gh/hameerabbasi/3/head -> origin/gh/hameerabbasi/3/head 2025-12-04T09:17:18.4149158Z * [new branch] gh/hameerabbasi/3/orig -> origin/gh/hameerabbasi/3/orig 2025-12-04T09:17:18.4151577Z * [new branch] gh/hameerabbasi/4/base -> origin/gh/hameerabbasi/4/base 2025-12-04T09:17:18.4153461Z * [new branch] gh/hameerabbasi/4/head -> origin/gh/hameerabbasi/4/head 2025-12-04T09:17:18.4155253Z * [new branch] gh/hameerabbasi/4/orig -> origin/gh/hameerabbasi/4/orig 2025-12-04T09:17:18.4158410Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-12-04T09:17:18.4160781Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-12-04T09:17:18.4163432Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-12-04T09:17:18.4165981Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-12-04T09:17:18.4168482Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-12-04T09:17:18.4170989Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-12-04T09:17:18.4174057Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-12-04T09:17:18.4175976Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-12-04T09:17:18.4179397Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-12-04T09:17:18.4181244Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-12-04T09:17:18.4183831Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-12-04T09:17:18.4185939Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-12-04T09:17:18.4187542Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-12-04T09:17:18.4190648Z * [new branch] gh/isuruf/158/base -> origin/gh/isuruf/158/base 2025-12-04T09:17:18.4192510Z * [new branch] gh/isuruf/158/head -> origin/gh/isuruf/158/head 2025-12-04T09:17:18.4195049Z * [new branch] gh/isuruf/159/base -> origin/gh/isuruf/159/base 2025-12-04T09:17:18.4196808Z * [new branch] gh/isuruf/159/head -> origin/gh/isuruf/159/head 2025-12-04T09:17:18.4199362Z * [new branch] gh/isuruf/160/base -> origin/gh/isuruf/160/base 2025-12-04T09:17:18.4201210Z * [new branch] gh/isuruf/160/head -> origin/gh/isuruf/160/head 2025-12-04T09:17:18.4203094Z * [new branch] gh/isuruf/160/orig -> origin/gh/isuruf/160/orig 2025-12-04T09:17:18.4205692Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-12-04T09:17:18.4207543Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-12-04T09:17:18.4209732Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-12-04T09:17:18.4212780Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-12-04T09:17:18.4214626Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-12-04T09:17:18.4216523Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-12-04T09:17:18.4219157Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-12-04T09:17:18.4221595Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-12-04T09:17:18.4223417Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-12-04T09:17:18.4225971Z * [new branch] gh/jamesjwu/196/base -> origin/gh/jamesjwu/196/base 2025-12-04T09:17:18.4227837Z * [new branch] gh/jamesjwu/196/head -> origin/gh/jamesjwu/196/head 2025-12-04T09:17:18.4229693Z * [new branch] gh/jamesjwu/196/orig -> origin/gh/jamesjwu/196/orig 2025-12-04T09:17:18.4232238Z * [new branch] gh/jamesjwu/198/base -> origin/gh/jamesjwu/198/base 2025-12-04T09:17:18.4234151Z * [new branch] gh/jamesjwu/198/head -> origin/gh/jamesjwu/198/head 2025-12-04T09:17:18.4236066Z * [new branch] gh/jamesjwu/198/orig -> origin/gh/jamesjwu/198/orig 2025-12-04T09:17:18.4238767Z * [new branch] gh/jamesjwu/207/base -> origin/gh/jamesjwu/207/base 2025-12-04T09:17:18.4240798Z * [new branch] gh/jamesjwu/207/head -> origin/gh/jamesjwu/207/head 2025-12-04T09:17:18.4242663Z * [new branch] gh/jamesjwu/207/orig -> origin/gh/jamesjwu/207/orig 2025-12-04T09:17:18.4245290Z * [new branch] gh/jamesjwu/208/base -> origin/gh/jamesjwu/208/base 2025-12-04T09:17:18.4247125Z * [new branch] gh/jamesjwu/208/head -> origin/gh/jamesjwu/208/head 2025-12-04T09:17:18.4248998Z * [new branch] gh/jamesjwu/208/orig -> origin/gh/jamesjwu/208/orig 2025-12-04T09:17:18.4251622Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-12-04T09:17:18.4253394Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-12-04T09:17:18.4255983Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-12-04T09:17:18.4257523Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-12-04T09:17:18.4260106Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-12-04T09:17:18.4261931Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-12-04T09:17:18.4264281Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-12-04T09:17:18.4266136Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-12-04T09:17:18.4268481Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-12-04T09:17:18.4270264Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-12-04T09:17:18.4272648Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-12-04T09:17:18.4274451Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-12-04T09:17:18.4276730Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-12-04T09:17:18.4278554Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-12-04T09:17:18.4281032Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-12-04T09:17:18.4283027Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-12-04T09:17:18.4285284Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-12-04T09:17:18.4287100Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-12-04T09:17:18.4289495Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-12-04T09:17:18.4291254Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-12-04T09:17:18.4294531Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-12-04T09:17:18.4295688Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-12-04T09:17:18.4298235Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-12-04T09:17:18.4300790Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-12-04T09:17:18.4303759Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-12-04T09:17:18.4305721Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-12-04T09:17:18.4308293Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-12-04T09:17:18.4310173Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-12-04T09:17:18.4313313Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-12-04T09:17:18.4315298Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-12-04T09:17:18.4317138Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-12-04T09:17:18.4319423Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-12-04T09:17:18.4321244Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-12-04T09:17:18.4323077Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-12-04T09:17:18.4325808Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-12-04T09:17:18.4327621Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-12-04T09:17:18.4329546Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-12-04T09:17:18.4332058Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-12-04T09:17:18.4334016Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-12-04T09:17:18.4335679Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-12-04T09:17:18.4338549Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-12-04T09:17:18.4340583Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-12-04T09:17:18.4342891Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-12-04T09:17:18.4344670Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-12-04T09:17:18.4347220Z * [new branch] gh/janeyx99/305/base -> origin/gh/janeyx99/305/base 2025-12-04T09:17:18.4349105Z * [new branch] gh/janeyx99/305/head -> origin/gh/janeyx99/305/head 2025-12-04T09:17:18.4351424Z * [new branch] gh/janeyx99/306/base -> origin/gh/janeyx99/306/base 2025-12-04T09:17:18.4353233Z * [new branch] gh/janeyx99/306/head -> origin/gh/janeyx99/306/head 2025-12-04T09:17:18.4355712Z * [new branch] gh/janeyx99/314/base -> origin/gh/janeyx99/314/base 2025-12-04T09:17:18.4357640Z * [new branch] gh/janeyx99/314/head -> origin/gh/janeyx99/314/head 2025-12-04T09:17:18.4359453Z * [new branch] gh/janeyx99/314/orig -> origin/gh/janeyx99/314/orig 2025-12-04T09:17:18.4361946Z * [new branch] gh/janeyx99/315/base -> origin/gh/janeyx99/315/base 2025-12-04T09:17:18.4363765Z * [new branch] gh/janeyx99/315/head -> origin/gh/janeyx99/315/head 2025-12-04T09:17:18.4365687Z * [new branch] gh/janeyx99/315/orig -> origin/gh/janeyx99/315/orig 2025-12-04T09:17:18.4368177Z * [new branch] gh/janeyx99/316/base -> origin/gh/janeyx99/316/base 2025-12-04T09:17:18.4370027Z * [new branch] gh/janeyx99/316/head -> origin/gh/janeyx99/316/head 2025-12-04T09:17:18.4372440Z * [new branch] gh/janeyx99/316/orig -> origin/gh/janeyx99/316/orig 2025-12-04T09:17:18.4375040Z * [new branch] gh/janeyx99/317/base -> origin/gh/janeyx99/317/base 2025-12-04T09:17:18.4376838Z * [new branch] gh/janeyx99/317/head -> origin/gh/janeyx99/317/head 2025-12-04T09:17:18.4378771Z * [new branch] gh/janeyx99/317/orig -> origin/gh/janeyx99/317/orig 2025-12-04T09:17:18.4381614Z * [new branch] gh/janeyx99/325/base -> origin/gh/janeyx99/325/base 2025-12-04T09:17:18.4383472Z * [new branch] gh/janeyx99/325/head -> origin/gh/janeyx99/325/head 2025-12-04T09:17:18.4385339Z * [new branch] gh/janeyx99/325/orig -> origin/gh/janeyx99/325/orig 2025-12-04T09:17:18.4387876Z * [new branch] gh/janeyx99/327/base -> origin/gh/janeyx99/327/base 2025-12-04T09:17:18.4389734Z * [new branch] gh/janeyx99/327/head -> origin/gh/janeyx99/327/head 2025-12-04T09:17:18.4391576Z * [new branch] gh/janeyx99/327/orig -> origin/gh/janeyx99/327/orig 2025-12-04T09:17:18.4394069Z * [new branch] gh/janeyx99/328/base -> origin/gh/janeyx99/328/base 2025-12-04T09:17:18.4395929Z * [new branch] gh/janeyx99/328/head -> origin/gh/janeyx99/328/head 2025-12-04T09:17:18.4397757Z * [new branch] gh/janeyx99/328/orig -> origin/gh/janeyx99/328/orig 2025-12-04T09:17:18.4400146Z * [new branch] gh/janeyx99/329/base -> origin/gh/janeyx99/329/base 2025-12-04T09:17:18.4401992Z * [new branch] gh/janeyx99/329/head -> origin/gh/janeyx99/329/head 2025-12-04T09:17:18.4403905Z * [new branch] gh/janeyx99/329/orig -> origin/gh/janeyx99/329/orig 2025-12-04T09:17:18.4406994Z * [new branch] gh/janeyx99/330/base -> origin/gh/janeyx99/330/base 2025-12-04T09:17:18.4409368Z * [new branch] gh/janeyx99/330/head -> origin/gh/janeyx99/330/head 2025-12-04T09:17:18.4413171Z * [new branch] gh/janeyx99/330/orig -> origin/gh/janeyx99/330/orig 2025-12-04T09:17:18.4415740Z * [new branch] gh/janeyx99/331/base -> origin/gh/janeyx99/331/base 2025-12-04T09:17:18.4417568Z * [new branch] gh/janeyx99/331/head -> origin/gh/janeyx99/331/head 2025-12-04T09:17:18.4419424Z * [new branch] gh/janeyx99/331/orig -> origin/gh/janeyx99/331/orig 2025-12-04T09:17:18.4422334Z * [new branch] gh/janeyx99/332/base -> origin/gh/janeyx99/332/base 2025-12-04T09:17:18.4423873Z * [new branch] gh/janeyx99/332/head -> origin/gh/janeyx99/332/head 2025-12-04T09:17:18.4425697Z * [new branch] gh/janeyx99/332/orig -> origin/gh/janeyx99/332/orig 2025-12-04T09:17:18.4428173Z * [new branch] gh/janeyx99/333/base -> origin/gh/janeyx99/333/base 2025-12-04T09:17:18.4429995Z * [new branch] gh/janeyx99/333/head -> origin/gh/janeyx99/333/head 2025-12-04T09:17:18.4431913Z * [new branch] gh/janeyx99/333/orig -> origin/gh/janeyx99/333/orig 2025-12-04T09:17:18.4434581Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-12-04T09:17:18.4436386Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-12-04T09:17:18.4438205Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-12-04T09:17:18.4441254Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-12-04T09:17:18.4443098Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-12-04T09:17:18.4445569Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-12-04T09:17:18.4447444Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-12-04T09:17:18.4449776Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-12-04T09:17:18.4452270Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-12-04T09:17:18.4454073Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-12-04T09:17:18.4456052Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-12-04T09:17:18.4459129Z * [new branch] gh/jansel/533/base -> origin/gh/jansel/533/base 2025-12-04T09:17:18.4461055Z * [new branch] gh/jansel/533/head -> origin/gh/jansel/533/head 2025-12-04T09:17:18.4462838Z * [new branch] gh/jansel/533/orig -> origin/gh/jansel/533/orig 2025-12-04T09:17:18.4465340Z * [new branch] gh/jansel/552/base -> origin/gh/jansel/552/base 2025-12-04T09:17:18.4467110Z * [new branch] gh/jansel/552/head -> origin/gh/jansel/552/head 2025-12-04T09:17:18.4468933Z * [new branch] gh/jansel/552/orig -> origin/gh/jansel/552/orig 2025-12-04T09:17:18.4471460Z * [new branch] gh/jansel/553/base -> origin/gh/jansel/553/base 2025-12-04T09:17:18.4473268Z * [new branch] gh/jansel/553/head -> origin/gh/jansel/553/head 2025-12-04T09:17:18.4475070Z * [new branch] gh/jansel/553/orig -> origin/gh/jansel/553/orig 2025-12-04T09:17:18.4478038Z * [new branch] gh/jansel/554/base -> origin/gh/jansel/554/base 2025-12-04T09:17:18.4479876Z * [new branch] gh/jansel/554/head -> origin/gh/jansel/554/head 2025-12-04T09:17:18.4482030Z * [new branch] gh/jansel/554/orig -> origin/gh/jansel/554/orig 2025-12-04T09:17:18.4484547Z * [new branch] gh/jansel/555/base -> origin/gh/jansel/555/base 2025-12-04T09:17:18.4486505Z * [new branch] gh/jansel/555/head -> origin/gh/jansel/555/head 2025-12-04T09:17:18.4488518Z * [new branch] gh/jansel/555/orig -> origin/gh/jansel/555/orig 2025-12-04T09:17:18.4491009Z * [new branch] gh/jansel/556/base -> origin/gh/jansel/556/base 2025-12-04T09:17:18.4493113Z * [new branch] gh/jansel/556/head -> origin/gh/jansel/556/head 2025-12-04T09:17:18.4495241Z * [new branch] gh/jansel/556/orig -> origin/gh/jansel/556/orig 2025-12-04T09:17:18.4498944Z * [new branch] gh/jansel/557/base -> origin/gh/jansel/557/base 2025-12-04T09:17:18.4501629Z * [new branch] gh/jansel/557/head -> origin/gh/jansel/557/head 2025-12-04T09:17:18.4504080Z * [new branch] gh/jansel/557/orig -> origin/gh/jansel/557/orig 2025-12-04T09:17:18.4507378Z * [new branch] gh/jansel/558/base -> origin/gh/jansel/558/base 2025-12-04T09:17:18.4510220Z * [new branch] gh/jansel/558/head -> origin/gh/jansel/558/head 2025-12-04T09:17:18.4512627Z * [new branch] gh/jansel/558/orig -> origin/gh/jansel/558/orig 2025-12-04T09:17:18.4515985Z * [new branch] gh/jansel/559/base -> origin/gh/jansel/559/base 2025-12-04T09:17:18.4518328Z * [new branch] gh/jansel/559/head -> origin/gh/jansel/559/head 2025-12-04T09:17:18.4520804Z * [new branch] gh/jansel/559/orig -> origin/gh/jansel/559/orig 2025-12-04T09:17:18.4524225Z * [new branch] gh/jansel/560/base -> origin/gh/jansel/560/base 2025-12-04T09:17:18.4525964Z * [new branch] gh/jansel/560/head -> origin/gh/jansel/560/head 2025-12-04T09:17:18.4527835Z * [new branch] gh/jansel/560/orig -> origin/gh/jansel/560/orig 2025-12-04T09:17:18.4530310Z * [new branch] gh/jansel/561/base -> origin/gh/jansel/561/base 2025-12-04T09:17:18.4532129Z * [new branch] gh/jansel/561/head -> origin/gh/jansel/561/head 2025-12-04T09:17:18.4533916Z * [new branch] gh/jansel/561/orig -> origin/gh/jansel/561/orig 2025-12-04T09:17:18.4536546Z * [new branch] gh/jansel/562/base -> origin/gh/jansel/562/base 2025-12-04T09:17:18.4538277Z * [new branch] gh/jansel/562/head -> origin/gh/jansel/562/head 2025-12-04T09:17:18.4540362Z * [new branch] gh/jansel/562/orig -> origin/gh/jansel/562/orig 2025-12-04T09:17:18.4542851Z * [new branch] gh/jansel/563/base -> origin/gh/jansel/563/base 2025-12-04T09:17:18.4544715Z * [new branch] gh/jansel/563/head -> origin/gh/jansel/563/head 2025-12-04T09:17:18.4546583Z * [new branch] gh/jansel/563/orig -> origin/gh/jansel/563/orig 2025-12-04T09:17:18.4549587Z * [new branch] gh/jansel/564/base -> origin/gh/jansel/564/base 2025-12-04T09:17:18.4551425Z * [new branch] gh/jansel/564/head -> origin/gh/jansel/564/head 2025-12-04T09:17:18.4553279Z * [new branch] gh/jansel/564/orig -> origin/gh/jansel/564/orig 2025-12-04T09:17:18.4555857Z * [new branch] gh/jansel/565/base -> origin/gh/jansel/565/base 2025-12-04T09:17:18.4557678Z * [new branch] gh/jansel/565/head -> origin/gh/jansel/565/head 2025-12-04T09:17:18.4559556Z * [new branch] gh/jansel/565/orig -> origin/gh/jansel/565/orig 2025-12-04T09:17:18.4562214Z * [new branch] gh/jansel/566/base -> origin/gh/jansel/566/base 2025-12-04T09:17:18.4563993Z * [new branch] gh/jansel/566/head -> origin/gh/jansel/566/head 2025-12-04T09:17:18.4565903Z * [new branch] gh/jansel/566/orig -> origin/gh/jansel/566/orig 2025-12-04T09:17:18.4568480Z * [new branch] gh/jansel/567/base -> origin/gh/jansel/567/base 2025-12-04T09:17:18.4570498Z * [new branch] gh/jansel/567/head -> origin/gh/jansel/567/head 2025-12-04T09:17:18.4572115Z * [new branch] gh/jansel/567/orig -> origin/gh/jansel/567/orig 2025-12-04T09:17:18.4574752Z * [new branch] gh/jansel/568/base -> origin/gh/jansel/568/base 2025-12-04T09:17:18.4576692Z * [new branch] gh/jansel/568/head -> origin/gh/jansel/568/head 2025-12-04T09:17:18.4578480Z * [new branch] gh/jansel/568/orig -> origin/gh/jansel/568/orig 2025-12-04T09:17:18.4581168Z * [new branch] gh/jansel/569/base -> origin/gh/jansel/569/base 2025-12-04T09:17:18.4582941Z * [new branch] gh/jansel/569/head -> origin/gh/jansel/569/head 2025-12-04T09:17:18.4584736Z * [new branch] gh/jansel/569/orig -> origin/gh/jansel/569/orig 2025-12-04T09:17:18.4587298Z * [new branch] gh/jansel/570/base -> origin/gh/jansel/570/base 2025-12-04T09:17:18.4589107Z * [new branch] gh/jansel/570/head -> origin/gh/jansel/570/head 2025-12-04T09:17:18.4591136Z * [new branch] gh/jansel/570/orig -> origin/gh/jansel/570/orig 2025-12-04T09:17:18.4593697Z * [new branch] gh/jansel/571/base -> origin/gh/jansel/571/base 2025-12-04T09:17:18.4595537Z * [new branch] gh/jansel/571/head -> origin/gh/jansel/571/head 2025-12-04T09:17:18.4597334Z * [new branch] gh/jansel/571/orig -> origin/gh/jansel/571/orig 2025-12-04T09:17:18.4599805Z * [new branch] gh/jansel/572/base -> origin/gh/jansel/572/base 2025-12-04T09:17:18.4601669Z * [new branch] gh/jansel/572/head -> origin/gh/jansel/572/head 2025-12-04T09:17:18.4603430Z * [new branch] gh/jansel/572/orig -> origin/gh/jansel/572/orig 2025-12-04T09:17:18.4606037Z * [new branch] gh/jansel/573/base -> origin/gh/jansel/573/base 2025-12-04T09:17:18.4608108Z * [new branch] gh/jansel/573/head -> origin/gh/jansel/573/head 2025-12-04T09:17:18.4610067Z * [new branch] gh/jansel/573/orig -> origin/gh/jansel/573/orig 2025-12-04T09:17:18.4612614Z * [new branch] gh/jansel/574/base -> origin/gh/jansel/574/base 2025-12-04T09:17:18.4614393Z * [new branch] gh/jansel/574/head -> origin/gh/jansel/574/head 2025-12-04T09:17:18.4616374Z * [new branch] gh/jansel/574/orig -> origin/gh/jansel/574/orig 2025-12-04T09:17:18.4619008Z * [new branch] gh/jansel/575/base -> origin/gh/jansel/575/base 2025-12-04T09:17:18.4620874Z * [new branch] gh/jansel/575/head -> origin/gh/jansel/575/head 2025-12-04T09:17:18.4622723Z * [new branch] gh/jansel/575/orig -> origin/gh/jansel/575/orig 2025-12-04T09:17:18.4625355Z * [new branch] gh/jansel/576/base -> origin/gh/jansel/576/base 2025-12-04T09:17:18.4627233Z * [new branch] gh/jansel/576/head -> origin/gh/jansel/576/head 2025-12-04T09:17:18.4629045Z * [new branch] gh/jansel/576/orig -> origin/gh/jansel/576/orig 2025-12-04T09:17:18.4632153Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-12-04T09:17:18.4634009Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-12-04T09:17:18.4635829Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-12-04T09:17:18.4638331Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-12-04T09:17:18.4640215Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-12-04T09:17:18.4642062Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-12-04T09:17:18.4645374Z * [new branch] gh/jerryzh168/1/base -> origin/gh/jerryzh168/1/base 2025-12-04T09:17:18.4647052Z * [new branch] gh/jerryzh168/1/head -> origin/gh/jerryzh168/1/head 2025-12-04T09:17:18.4648908Z * [new branch] gh/jerryzh168/1/orig -> origin/gh/jerryzh168/1/orig 2025-12-04T09:17:18.4651936Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-12-04T09:17:18.4653921Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-12-04T09:17:18.4655553Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-12-04T09:17:18.4658018Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-12-04T09:17:18.4660012Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-12-04T09:17:18.4661789Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-12-04T09:17:18.4664399Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-12-04T09:17:18.4666205Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-12-04T09:17:18.4668059Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-12-04T09:17:18.4670569Z * [new branch] gh/jiayisunx/77/base -> origin/gh/jiayisunx/77/base 2025-12-04T09:17:18.4672357Z * [new branch] gh/jiayisunx/77/head -> origin/gh/jiayisunx/77/head 2025-12-04T09:17:18.4674161Z * [new branch] gh/jiayisunx/77/orig -> origin/gh/jiayisunx/77/orig 2025-12-04T09:17:18.4676688Z * [new branch] gh/jiayisunx/78/base -> origin/gh/jiayisunx/78/base 2025-12-04T09:17:18.4678522Z * [new branch] gh/jiayisunx/78/head -> origin/gh/jiayisunx/78/head 2025-12-04T09:17:18.4680331Z * [new branch] gh/jiayisunx/78/orig -> origin/gh/jiayisunx/78/orig 2025-12-04T09:17:18.4682857Z * [new branch] gh/jiayisunx/79/base -> origin/gh/jiayisunx/79/base 2025-12-04T09:17:18.4684721Z * [new branch] gh/jiayisunx/79/head -> origin/gh/jiayisunx/79/head 2025-12-04T09:17:18.4686501Z * [new branch] gh/jiayisunx/79/orig -> origin/gh/jiayisunx/79/orig 2025-12-04T09:17:18.4689147Z * [new branch] gh/jiayisunx/82/base -> origin/gh/jiayisunx/82/base 2025-12-04T09:17:18.4690955Z * [new branch] gh/jiayisunx/82/head -> origin/gh/jiayisunx/82/head 2025-12-04T09:17:18.4692804Z * [new branch] gh/jiayisunx/82/orig -> origin/gh/jiayisunx/82/orig 2025-12-04T09:17:18.4695277Z * [new branch] gh/jiayisunx/83/base -> origin/gh/jiayisunx/83/base 2025-12-04T09:17:18.4697099Z * [new branch] gh/jiayisunx/83/head -> origin/gh/jiayisunx/83/head 2025-12-04T09:17:18.4698897Z * [new branch] gh/jiayisunx/83/orig -> origin/gh/jiayisunx/83/orig 2025-12-04T09:17:18.4701510Z * [new branch] gh/jiayisunx/84/base -> origin/gh/jiayisunx/84/base 2025-12-04T09:17:18.4703275Z * [new branch] gh/jiayisunx/84/head -> origin/gh/jiayisunx/84/head 2025-12-04T09:17:18.4705071Z * [new branch] gh/jiayisunx/84/orig -> origin/gh/jiayisunx/84/orig 2025-12-04T09:17:18.4707535Z * [new branch] gh/jiayisunx/85/base -> origin/gh/jiayisunx/85/base 2025-12-04T09:17:18.4709544Z * [new branch] gh/jiayisunx/85/head -> origin/gh/jiayisunx/85/head 2025-12-04T09:17:18.4711321Z * [new branch] gh/jiayisunx/85/orig -> origin/gh/jiayisunx/85/orig 2025-12-04T09:17:18.4713868Z * [new branch] gh/jiayisunx/86/base -> origin/gh/jiayisunx/86/base 2025-12-04T09:17:18.4721409Z * [new branch] gh/jiayisunx/86/head -> origin/gh/jiayisunx/86/head 2025-12-04T09:17:18.4722238Z * [new branch] gh/jiayisunx/86/orig -> origin/gh/jiayisunx/86/orig 2025-12-04T09:17:18.4722792Z * [new branch] gh/jiayisunx/87/base -> origin/gh/jiayisunx/87/base 2025-12-04T09:17:18.4723330Z * [new branch] gh/jiayisunx/87/head -> origin/gh/jiayisunx/87/head 2025-12-04T09:17:18.4723984Z * [new branch] gh/jiayisunx/87/orig -> origin/gh/jiayisunx/87/orig 2025-12-04T09:17:18.4726374Z * [new branch] gh/jiayisunx/88/base -> origin/gh/jiayisunx/88/base 2025-12-04T09:17:18.4728091Z * [new branch] gh/jiayisunx/88/head -> origin/gh/jiayisunx/88/head 2025-12-04T09:17:18.4729926Z * [new branch] gh/jiayisunx/88/orig -> origin/gh/jiayisunx/88/orig 2025-12-04T09:17:18.4732410Z * [new branch] gh/jiayisunx/89/base -> origin/gh/jiayisunx/89/base 2025-12-04T09:17:18.4734258Z * [new branch] gh/jiayisunx/89/head -> origin/gh/jiayisunx/89/head 2025-12-04T09:17:18.4736128Z * [new branch] gh/jiayisunx/89/orig -> origin/gh/jiayisunx/89/orig 2025-12-04T09:17:18.4738887Z * [new branch] gh/jiayisunx/90/base -> origin/gh/jiayisunx/90/base 2025-12-04T09:17:18.4740945Z * [new branch] gh/jiayisunx/90/head -> origin/gh/jiayisunx/90/head 2025-12-04T09:17:18.4742765Z * [new branch] gh/jiayisunx/90/orig -> origin/gh/jiayisunx/90/orig 2025-12-04T09:17:18.4745640Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-12-04T09:17:18.4747437Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-12-04T09:17:18.4750403Z * [new branch] gh/jturney/1/base -> origin/gh/jturney/1/base 2025-12-04T09:17:18.4752273Z * [new branch] gh/jturney/1/head -> origin/gh/jturney/1/head 2025-12-04T09:17:18.4754069Z * [new branch] gh/jturney/1/orig -> origin/gh/jturney/1/orig 2025-12-04T09:17:18.4756528Z * [new branch] gh/jturney/2/base -> origin/gh/jturney/2/base 2025-12-04T09:17:18.4758348Z * [new branch] gh/jturney/2/head -> origin/gh/jturney/2/head 2025-12-04T09:17:18.4760326Z * [new branch] gh/jturney/2/orig -> origin/gh/jturney/2/orig 2025-12-04T09:17:18.4763527Z * [new branch] gh/karthickai/10/base -> origin/gh/karthickai/10/base 2025-12-04T09:17:18.4765449Z * [new branch] gh/karthickai/10/head -> origin/gh/karthickai/10/head 2025-12-04T09:17:18.4767284Z * [new branch] gh/karthickai/10/orig -> origin/gh/karthickai/10/orig 2025-12-04T09:17:18.4769827Z * [new branch] gh/karthickai/11/base -> origin/gh/karthickai/11/base 2025-12-04T09:17:18.4771718Z * [new branch] gh/karthickai/11/head -> origin/gh/karthickai/11/head 2025-12-04T09:17:18.4773560Z * [new branch] gh/karthickai/11/orig -> origin/gh/karthickai/11/orig 2025-12-04T09:17:18.4776416Z * [new branch] gh/karthickai/12/base -> origin/gh/karthickai/12/base 2025-12-04T09:17:18.4778303Z * [new branch] gh/karthickai/12/head -> origin/gh/karthickai/12/head 2025-12-04T09:17:18.4780303Z * [new branch] gh/karthickai/12/orig -> origin/gh/karthickai/12/orig 2025-12-04T09:17:18.4782835Z * [new branch] gh/karthickai/13/base -> origin/gh/karthickai/13/base 2025-12-04T09:17:18.4784841Z * [new branch] gh/karthickai/13/head -> origin/gh/karthickai/13/head 2025-12-04T09:17:18.4786672Z * [new branch] gh/karthickai/13/orig -> origin/gh/karthickai/13/orig 2025-12-04T09:17:18.4789387Z * [new branch] gh/karthickai/14/base -> origin/gh/karthickai/14/base 2025-12-04T09:17:18.4791932Z * [new branch] gh/karthickai/14/head -> origin/gh/karthickai/14/head 2025-12-04T09:17:18.4793918Z * [new branch] gh/karthickai/14/orig -> origin/gh/karthickai/14/orig 2025-12-04T09:17:18.4796583Z * [new branch] gh/karthickai/15/base -> origin/gh/karthickai/15/base 2025-12-04T09:17:18.4798433Z * [new branch] gh/karthickai/15/head -> origin/gh/karthickai/15/head 2025-12-04T09:17:18.4800221Z * [new branch] gh/karthickai/15/orig -> origin/gh/karthickai/15/orig 2025-12-04T09:17:18.4802674Z * [new branch] gh/karthickai/16/base -> origin/gh/karthickai/16/base 2025-12-04T09:17:18.4804561Z * [new branch] gh/karthickai/16/head -> origin/gh/karthickai/16/head 2025-12-04T09:17:18.4806446Z * [new branch] gh/karthickai/16/orig -> origin/gh/karthickai/16/orig 2025-12-04T09:17:18.4808843Z * [new branch] gh/karthickai/17/base -> origin/gh/karthickai/17/base 2025-12-04T09:17:18.4813177Z * [new branch] gh/karthickai/17/head -> origin/gh/karthickai/17/head 2025-12-04T09:17:18.4814981Z * [new branch] gh/karthickai/17/orig -> origin/gh/karthickai/17/orig 2025-12-04T09:17:18.4817814Z * [new branch] gh/karthickai/18/base -> origin/gh/karthickai/18/base 2025-12-04T09:17:18.4820025Z * [new branch] gh/karthickai/18/head -> origin/gh/karthickai/18/head 2025-12-04T09:17:18.4821954Z * [new branch] gh/karthickai/18/orig -> origin/gh/karthickai/18/orig 2025-12-04T09:17:18.4824917Z * [new branch] gh/karthickai/19/base -> origin/gh/karthickai/19/base 2025-12-04T09:17:18.4826773Z * [new branch] gh/karthickai/19/head -> origin/gh/karthickai/19/head 2025-12-04T09:17:18.4828586Z * [new branch] gh/karthickai/19/orig -> origin/gh/karthickai/19/orig 2025-12-04T09:17:18.4832019Z * [new branch] gh/karthickai/20/base -> origin/gh/karthickai/20/base 2025-12-04T09:17:18.4834533Z * [new branch] gh/karthickai/20/head -> origin/gh/karthickai/20/head 2025-12-04T09:17:18.4836428Z * [new branch] gh/karthickai/20/orig -> origin/gh/karthickai/20/orig 2025-12-04T09:17:18.4838985Z * [new branch] gh/karthickai/21/base -> origin/gh/karthickai/21/base 2025-12-04T09:17:18.4841193Z * [new branch] gh/karthickai/21/head -> origin/gh/karthickai/21/head 2025-12-04T09:17:18.4843036Z * [new branch] gh/karthickai/21/orig -> origin/gh/karthickai/21/orig 2025-12-04T09:17:18.4845701Z * [new branch] gh/karthickai/22/base -> origin/gh/karthickai/22/base 2025-12-04T09:17:18.4847455Z * [new branch] gh/karthickai/22/head -> origin/gh/karthickai/22/head 2025-12-04T09:17:18.4849382Z * [new branch] gh/karthickai/22/orig -> origin/gh/karthickai/22/orig 2025-12-04T09:17:18.4852040Z * [new branch] gh/karthickai/23/base -> origin/gh/karthickai/23/base 2025-12-04T09:17:18.4853972Z * [new branch] gh/karthickai/23/head -> origin/gh/karthickai/23/head 2025-12-04T09:17:18.4856355Z * [new branch] gh/karthickai/23/orig -> origin/gh/karthickai/23/orig 2025-12-04T09:17:18.4859026Z * [new branch] gh/karthickai/24/base -> origin/gh/karthickai/24/base 2025-12-04T09:17:18.4860921Z * [new branch] gh/karthickai/24/head -> origin/gh/karthickai/24/head 2025-12-04T09:17:18.4862726Z * [new branch] gh/karthickai/24/orig -> origin/gh/karthickai/24/orig 2025-12-04T09:17:18.4865774Z * [new branch] gh/karthickai/25/base -> origin/gh/karthickai/25/base 2025-12-04T09:17:18.4867755Z * [new branch] gh/karthickai/25/head -> origin/gh/karthickai/25/head 2025-12-04T09:17:18.4869579Z * [new branch] gh/karthickai/25/orig -> origin/gh/karthickai/25/orig 2025-12-04T09:17:18.4872041Z * [new branch] gh/karthickai/26/base -> origin/gh/karthickai/26/base 2025-12-04T09:17:18.4874151Z * [new branch] gh/karthickai/26/head -> origin/gh/karthickai/26/head 2025-12-04T09:17:18.4875810Z * [new branch] gh/karthickai/26/orig -> origin/gh/karthickai/26/orig 2025-12-04T09:17:18.4879645Z * [new branch] gh/karthickai/6/base -> origin/gh/karthickai/6/base 2025-12-04T09:17:18.4882039Z * [new branch] gh/karthickai/6/head -> origin/gh/karthickai/6/head 2025-12-04T09:17:18.4883857Z * [new branch] gh/karthickai/6/orig -> origin/gh/karthickai/6/orig 2025-12-04T09:17:18.4886963Z * [new branch] gh/krocki/1/base -> origin/gh/krocki/1/base 2025-12-04T09:17:18.4888869Z * [new branch] gh/krocki/1/head -> origin/gh/krocki/1/head 2025-12-04T09:17:18.4890670Z * [new branch] gh/krocki/1/orig -> origin/gh/krocki/1/orig 2025-12-04T09:17:18.4893277Z * [new branch] gh/krocki/2/base -> origin/gh/krocki/2/base 2025-12-04T09:17:18.4895160Z * [new branch] gh/krocki/2/head -> origin/gh/krocki/2/head 2025-12-04T09:17:18.4897477Z * [new branch] gh/krocki/2/orig -> origin/gh/krocki/2/orig 2025-12-04T09:17:18.4900795Z * [new branch] gh/kurtamohler/60/base -> origin/gh/kurtamohler/60/base 2025-12-04T09:17:18.4902652Z * [new branch] gh/kurtamohler/60/head -> origin/gh/kurtamohler/60/head 2025-12-04T09:17:18.4904454Z * [new branch] gh/kurtamohler/60/orig -> origin/gh/kurtamohler/60/orig 2025-12-04T09:17:18.4906969Z * [new branch] gh/kurtamohler/61/base -> origin/gh/kurtamohler/61/base 2025-12-04T09:17:18.4908799Z * [new branch] gh/kurtamohler/61/head -> origin/gh/kurtamohler/61/head 2025-12-04T09:17:18.4910834Z * [new branch] gh/kurtamohler/61/orig -> origin/gh/kurtamohler/61/orig 2025-12-04T09:17:18.4913494Z * [new branch] gh/kurtamohler/62/base -> origin/gh/kurtamohler/62/base 2025-12-04T09:17:18.4915579Z * [new branch] gh/kurtamohler/62/head -> origin/gh/kurtamohler/62/head 2025-12-04T09:17:18.4917426Z * [new branch] gh/kurtamohler/62/orig -> origin/gh/kurtamohler/62/orig 2025-12-04T09:17:18.4920094Z * [new branch] gh/kurtamohler/63/base -> origin/gh/kurtamohler/63/base 2025-12-04T09:17:18.4921901Z * [new branch] gh/kurtamohler/63/head -> origin/gh/kurtamohler/63/head 2025-12-04T09:17:18.4923722Z * [new branch] gh/kurtamohler/63/orig -> origin/gh/kurtamohler/63/orig 2025-12-04T09:17:18.4926248Z * [new branch] gh/kurtamohler/64/base -> origin/gh/kurtamohler/64/base 2025-12-04T09:17:18.4928021Z * [new branch] gh/kurtamohler/64/head -> origin/gh/kurtamohler/64/head 2025-12-04T09:17:18.4929834Z * [new branch] gh/kurtamohler/64/orig -> origin/gh/kurtamohler/64/orig 2025-12-04T09:17:18.4932315Z * [new branch] gh/kurtamohler/65/base -> origin/gh/kurtamohler/65/base 2025-12-04T09:17:18.4934162Z * [new branch] gh/kurtamohler/65/head -> origin/gh/kurtamohler/65/head 2025-12-04T09:17:18.4936071Z * [new branch] gh/kurtamohler/65/orig -> origin/gh/kurtamohler/65/orig 2025-12-04T09:17:18.4938448Z * [new branch] gh/kurtamohler/66/base -> origin/gh/kurtamohler/66/base 2025-12-04T09:17:18.4940483Z * [new branch] gh/kurtamohler/66/head -> origin/gh/kurtamohler/66/head 2025-12-04T09:17:18.4942283Z * [new branch] gh/kurtamohler/66/orig -> origin/gh/kurtamohler/66/orig 2025-12-04T09:17:18.4945043Z * [new branch] gh/kurtamohler/67/base -> origin/gh/kurtamohler/67/base 2025-12-04T09:17:18.4946814Z * [new branch] gh/kurtamohler/67/head -> origin/gh/kurtamohler/67/head 2025-12-04T09:17:18.4948840Z * [new branch] gh/kurtamohler/67/orig -> origin/gh/kurtamohler/67/orig 2025-12-04T09:17:18.4952057Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-12-04T09:17:18.4954380Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-12-04T09:17:18.4956209Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-12-04T09:17:18.4958781Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-12-04T09:17:18.4960568Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-12-04T09:17:18.4963175Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-12-04T09:17:18.4965036Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-12-04T09:17:18.4966860Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-12-04T09:17:18.4969477Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-12-04T09:17:18.4971323Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-12-04T09:17:18.4973073Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-12-04T09:17:18.4975558Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-12-04T09:17:18.4977433Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-12-04T09:17:18.4980019Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-12-04T09:17:18.4981821Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-12-04T09:17:18.4983592Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-12-04T09:17:18.4986130Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-12-04T09:17:18.4987922Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-12-04T09:17:18.4989938Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-12-04T09:17:18.4992688Z * [new branch] gh/kwen2501/234/base -> origin/gh/kwen2501/234/base 2025-12-04T09:17:18.4994587Z * [new branch] gh/kwen2501/234/head -> origin/gh/kwen2501/234/head 2025-12-04T09:17:18.4996347Z * [new branch] gh/kwen2501/234/orig -> origin/gh/kwen2501/234/orig 2025-12-04T09:17:18.4998832Z * [new branch] gh/kwen2501/235/base -> origin/gh/kwen2501/235/base 2025-12-04T09:17:18.5000661Z * [new branch] gh/kwen2501/235/head -> origin/gh/kwen2501/235/head 2025-12-04T09:17:18.5002474Z * [new branch] gh/kwen2501/235/orig -> origin/gh/kwen2501/235/orig 2025-12-04T09:17:18.5004941Z * [new branch] gh/kwen2501/236/base -> origin/gh/kwen2501/236/base 2025-12-04T09:17:18.5006756Z * [new branch] gh/kwen2501/236/head -> origin/gh/kwen2501/236/head 2025-12-04T09:17:18.5008853Z * [new branch] gh/kwen2501/236/orig -> origin/gh/kwen2501/236/orig 2025-12-04T09:17:18.5011283Z * [new branch] gh/kwen2501/237/base -> origin/gh/kwen2501/237/base 2025-12-04T09:17:18.5013139Z * [new branch] gh/kwen2501/237/head -> origin/gh/kwen2501/237/head 2025-12-04T09:17:18.5015010Z * [new branch] gh/kwen2501/237/orig -> origin/gh/kwen2501/237/orig 2025-12-04T09:17:18.5017576Z * [new branch] gh/kwen2501/238/base -> origin/gh/kwen2501/238/base 2025-12-04T09:17:18.5019469Z * [new branch] gh/kwen2501/238/head -> origin/gh/kwen2501/238/head 2025-12-04T09:17:18.5021433Z * [new branch] gh/kwen2501/238/orig -> origin/gh/kwen2501/238/orig 2025-12-04T09:17:18.5024264Z * [new branch] gh/kwen2501/240/base -> origin/gh/kwen2501/240/base 2025-12-04T09:17:18.5025703Z * [new branch] gh/kwen2501/240/head -> origin/gh/kwen2501/240/head 2025-12-04T09:17:18.5027478Z * [new branch] gh/kwen2501/240/orig -> origin/gh/kwen2501/240/orig 2025-12-04T09:17:18.5029921Z * [new branch] gh/kwen2501/241/base -> origin/gh/kwen2501/241/base 2025-12-04T09:17:18.5031749Z * [new branch] gh/kwen2501/241/head -> origin/gh/kwen2501/241/head 2025-12-04T09:17:18.5033508Z * [new branch] gh/kwen2501/241/orig -> origin/gh/kwen2501/241/orig 2025-12-04T09:17:18.5036008Z * [new branch] gh/kwen2501/247/base -> origin/gh/kwen2501/247/base 2025-12-04T09:17:18.5037817Z * [new branch] gh/kwen2501/247/head -> origin/gh/kwen2501/247/head 2025-12-04T09:17:18.5039651Z * [new branch] gh/kwen2501/247/orig -> origin/gh/kwen2501/247/orig 2025-12-04T09:17:18.5042239Z * [new branch] gh/kwen2501/252/base -> origin/gh/kwen2501/252/base 2025-12-04T09:17:18.5044077Z * [new branch] gh/kwen2501/252/head -> origin/gh/kwen2501/252/head 2025-12-04T09:17:18.5045845Z * [new branch] gh/kwen2501/252/orig -> origin/gh/kwen2501/252/orig 2025-12-04T09:17:18.5048903Z * [new branch] gh/kwen2501/259/base -> origin/gh/kwen2501/259/base 2025-12-04T09:17:18.5050788Z * [new branch] gh/kwen2501/259/head -> origin/gh/kwen2501/259/head 2025-12-04T09:17:18.5052684Z * [new branch] gh/kwen2501/259/orig -> origin/gh/kwen2501/259/orig 2025-12-04T09:17:18.5055340Z * [new branch] gh/kwen2501/260/base -> origin/gh/kwen2501/260/base 2025-12-04T09:17:18.5057281Z * [new branch] gh/kwen2501/260/head -> origin/gh/kwen2501/260/head 2025-12-04T09:17:18.5059175Z * [new branch] gh/kwen2501/260/orig -> origin/gh/kwen2501/260/orig 2025-12-04T09:17:18.5061799Z * [new branch] gh/kwen2501/268/base -> origin/gh/kwen2501/268/base 2025-12-04T09:17:18.5063611Z * [new branch] gh/kwen2501/268/head -> origin/gh/kwen2501/268/head 2025-12-04T09:17:18.5065445Z * [new branch] gh/kwen2501/268/orig -> origin/gh/kwen2501/268/orig 2025-12-04T09:17:18.5068204Z * [new branch] gh/kwen2501/269/base -> origin/gh/kwen2501/269/base 2025-12-04T09:17:18.5070140Z * [new branch] gh/kwen2501/269/head -> origin/gh/kwen2501/269/head 2025-12-04T09:17:18.5071917Z * [new branch] gh/kwen2501/269/orig -> origin/gh/kwen2501/269/orig 2025-12-04T09:17:18.5074582Z * [new branch] gh/kwen2501/270/base -> origin/gh/kwen2501/270/base 2025-12-04T09:17:18.5076550Z * [new branch] gh/kwen2501/270/head -> origin/gh/kwen2501/270/head 2025-12-04T09:17:18.5078397Z * [new branch] gh/kwen2501/270/orig -> origin/gh/kwen2501/270/orig 2025-12-04T09:17:18.5081072Z * [new branch] gh/kwen2501/271/base -> origin/gh/kwen2501/271/base 2025-12-04T09:17:18.5082880Z * [new branch] gh/kwen2501/271/head -> origin/gh/kwen2501/271/head 2025-12-04T09:17:18.5084710Z * [new branch] gh/kwen2501/271/orig -> origin/gh/kwen2501/271/orig 2025-12-04T09:17:18.5088006Z * [new branch] gh/kwen2501/274/base -> origin/gh/kwen2501/274/base 2025-12-04T09:17:18.5089953Z * [new branch] gh/kwen2501/274/head -> origin/gh/kwen2501/274/head 2025-12-04T09:17:18.5091786Z * [new branch] gh/kwen2501/274/orig -> origin/gh/kwen2501/274/orig 2025-12-04T09:17:18.5094599Z * [new branch] gh/kwen2501/275/base -> origin/gh/kwen2501/275/base 2025-12-04T09:17:18.5096567Z * [new branch] gh/kwen2501/275/head -> origin/gh/kwen2501/275/head 2025-12-04T09:17:18.5098468Z * [new branch] gh/kwen2501/275/orig -> origin/gh/kwen2501/275/orig 2025-12-04T09:17:18.5101194Z * [new branch] gh/kwen2501/276/base -> origin/gh/kwen2501/276/base 2025-12-04T09:17:18.5102986Z * [new branch] gh/kwen2501/276/head -> origin/gh/kwen2501/276/head 2025-12-04T09:17:18.5104759Z * [new branch] gh/kwen2501/276/orig -> origin/gh/kwen2501/276/orig 2025-12-04T09:17:18.5108119Z * [new branch] gh/kwen2501/277/base -> origin/gh/kwen2501/277/base 2025-12-04T09:17:18.5109960Z * [new branch] gh/kwen2501/277/head -> origin/gh/kwen2501/277/head 2025-12-04T09:17:18.5111686Z * [new branch] gh/kwen2501/277/orig -> origin/gh/kwen2501/277/orig 2025-12-04T09:17:18.5114753Z * [new branch] gh/kwen2501/278/base -> origin/gh/kwen2501/278/base 2025-12-04T09:17:18.5116574Z * [new branch] gh/kwen2501/278/head -> origin/gh/kwen2501/278/head 2025-12-04T09:17:18.5118408Z * [new branch] gh/kwen2501/278/orig -> origin/gh/kwen2501/278/orig 2025-12-04T09:17:18.5121642Z * [new branch] gh/kwen2501/279/base -> origin/gh/kwen2501/279/base 2025-12-04T09:17:18.5123607Z * [new branch] gh/kwen2501/279/head -> origin/gh/kwen2501/279/head 2025-12-04T09:17:18.5125517Z * [new branch] gh/kwen2501/279/orig -> origin/gh/kwen2501/279/orig 2025-12-04T09:17:18.5128152Z * [new branch] gh/kwen2501/280/base -> origin/gh/kwen2501/280/base 2025-12-04T09:17:18.5129991Z * [new branch] gh/kwen2501/280/head -> origin/gh/kwen2501/280/head 2025-12-04T09:17:18.5131885Z * [new branch] gh/kwen2501/280/orig -> origin/gh/kwen2501/280/orig 2025-12-04T09:17:18.5134433Z * [new branch] gh/kwen2501/281/base -> origin/gh/kwen2501/281/base 2025-12-04T09:17:18.5136323Z * [new branch] gh/kwen2501/281/head -> origin/gh/kwen2501/281/head 2025-12-04T09:17:18.5138198Z * [new branch] gh/kwen2501/281/orig -> origin/gh/kwen2501/281/orig 2025-12-04T09:17:18.5140963Z * [new branch] gh/kwen2501/282/base -> origin/gh/kwen2501/282/base 2025-12-04T09:17:18.5142821Z * [new branch] gh/kwen2501/282/head -> origin/gh/kwen2501/282/head 2025-12-04T09:17:18.5144617Z * [new branch] gh/kwen2501/282/orig -> origin/gh/kwen2501/282/orig 2025-12-04T09:17:18.5147286Z * [new branch] gh/kwen2501/283/base -> origin/gh/kwen2501/283/base 2025-12-04T09:17:18.5149131Z * [new branch] gh/kwen2501/283/head -> origin/gh/kwen2501/283/head 2025-12-04T09:17:18.5151260Z * [new branch] gh/kwen2501/283/orig -> origin/gh/kwen2501/283/orig 2025-12-04T09:17:18.5153888Z * [new branch] gh/kwen2501/284/base -> origin/gh/kwen2501/284/base 2025-12-04T09:17:18.5155839Z * [new branch] gh/kwen2501/284/head -> origin/gh/kwen2501/284/head 2025-12-04T09:17:18.5157666Z * [new branch] gh/kwen2501/284/orig -> origin/gh/kwen2501/284/orig 2025-12-04T09:17:18.5160197Z * [new branch] gh/kwen2501/285/base -> origin/gh/kwen2501/285/base 2025-12-04T09:17:18.5162021Z * [new branch] gh/kwen2501/285/head -> origin/gh/kwen2501/285/head 2025-12-04T09:17:18.5163849Z * [new branch] gh/kwen2501/285/orig -> origin/gh/kwen2501/285/orig 2025-12-04T09:17:18.5166414Z * [new branch] gh/kwen2501/286/base -> origin/gh/kwen2501/286/base 2025-12-04T09:17:18.5168297Z * [new branch] gh/kwen2501/286/head -> origin/gh/kwen2501/286/head 2025-12-04T09:17:18.5170167Z * [new branch] gh/kwen2501/286/orig -> origin/gh/kwen2501/286/orig 2025-12-04T09:17:18.5172671Z * [new branch] gh/kwen2501/287/base -> origin/gh/kwen2501/287/base 2025-12-04T09:17:18.5174661Z * [new branch] gh/kwen2501/287/head -> origin/gh/kwen2501/287/head 2025-12-04T09:17:18.5176353Z * [new branch] gh/kwen2501/287/orig -> origin/gh/kwen2501/287/orig 2025-12-04T09:17:18.5179094Z * [new branch] gh/kwen2501/288/base -> origin/gh/kwen2501/288/base 2025-12-04T09:17:18.5182205Z * [new branch] gh/kwen2501/288/head -> origin/gh/kwen2501/288/head 2025-12-04T09:17:18.5183649Z * [new branch] gh/kwen2501/288/orig -> origin/gh/kwen2501/288/orig 2025-12-04T09:17:18.5186694Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-12-04T09:17:18.5188542Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-12-04T09:17:18.5190326Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-12-04T09:17:18.5192803Z * [new branch] gh/laithsakka/276/base -> origin/gh/laithsakka/276/base 2025-12-04T09:17:18.5194618Z * [new branch] gh/laithsakka/276/head -> origin/gh/laithsakka/276/head 2025-12-04T09:17:18.5196405Z * [new branch] gh/laithsakka/276/orig -> origin/gh/laithsakka/276/orig 2025-12-04T09:17:18.5199266Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-12-04T09:17:18.5201634Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-12-04T09:17:18.5203978Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-12-04T09:17:18.5205813Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-12-04T09:17:18.5208314Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-12-04T09:17:18.5212796Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-12-04T09:17:18.5215429Z * [new branch] gh/laithsakka/313/base -> origin/gh/laithsakka/313/base 2025-12-04T09:17:18.5217309Z * [new branch] gh/laithsakka/313/head -> origin/gh/laithsakka/313/head 2025-12-04T09:17:18.5219199Z * [new branch] gh/laithsakka/313/orig -> origin/gh/laithsakka/313/orig 2025-12-04T09:17:18.5222013Z * [new branch] gh/laithsakka/316/base -> origin/gh/laithsakka/316/base 2025-12-04T09:17:18.5223866Z * [new branch] gh/laithsakka/316/head -> origin/gh/laithsakka/316/head 2025-12-04T09:17:18.5225699Z * [new branch] gh/laithsakka/316/orig -> origin/gh/laithsakka/316/orig 2025-12-04T09:17:18.5228207Z * [new branch] gh/laithsakka/317/base -> origin/gh/laithsakka/317/base 2025-12-04T09:17:18.5229938Z * [new branch] gh/laithsakka/317/head -> origin/gh/laithsakka/317/head 2025-12-04T09:17:18.5231683Z * [new branch] gh/laithsakka/317/orig -> origin/gh/laithsakka/317/orig 2025-12-04T09:17:18.5234347Z * [new branch] gh/laithsakka/319/base -> origin/gh/laithsakka/319/base 2025-12-04T09:17:18.5236162Z * [new branch] gh/laithsakka/319/head -> origin/gh/laithsakka/319/head 2025-12-04T09:17:18.5237966Z * [new branch] gh/laithsakka/319/orig -> origin/gh/laithsakka/319/orig 2025-12-04T09:17:18.5240498Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-12-04T09:17:18.5242384Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-12-04T09:17:18.5244975Z * [new branch] gh/laithsakka/320/base -> origin/gh/laithsakka/320/base 2025-12-04T09:17:18.5246812Z * [new branch] gh/laithsakka/320/head -> origin/gh/laithsakka/320/head 2025-12-04T09:17:18.5248823Z * [new branch] gh/laithsakka/320/orig -> origin/gh/laithsakka/320/orig 2025-12-04T09:17:18.5251362Z * [new branch] gh/laithsakka/321/base -> origin/gh/laithsakka/321/base 2025-12-04T09:17:18.5253348Z * [new branch] gh/laithsakka/321/head -> origin/gh/laithsakka/321/head 2025-12-04T09:17:18.5255089Z * [new branch] gh/laithsakka/321/orig -> origin/gh/laithsakka/321/orig 2025-12-04T09:17:18.5257829Z * [new branch] gh/laithsakka/322/base -> origin/gh/laithsakka/322/base 2025-12-04T09:17:18.5259855Z * [new branch] gh/laithsakka/322/head -> origin/gh/laithsakka/322/head 2025-12-04T09:17:18.5262172Z * [new branch] gh/laithsakka/322/orig -> origin/gh/laithsakka/322/orig 2025-12-04T09:17:18.5264964Z * [new branch] gh/laithsakka/323/base -> origin/gh/laithsakka/323/base 2025-12-04T09:17:18.5266851Z * [new branch] gh/laithsakka/323/head -> origin/gh/laithsakka/323/head 2025-12-04T09:17:18.5269140Z * [new branch] gh/laithsakka/323/orig -> origin/gh/laithsakka/323/orig 2025-12-04T09:17:18.5271701Z * [new branch] gh/laithsakka/324/base -> origin/gh/laithsakka/324/base 2025-12-04T09:17:18.5273466Z * [new branch] gh/laithsakka/324/head -> origin/gh/laithsakka/324/head 2025-12-04T09:17:18.5275464Z * [new branch] gh/laithsakka/324/orig -> origin/gh/laithsakka/324/orig 2025-12-04T09:17:18.5278026Z * [new branch] gh/laithsakka/325/base -> origin/gh/laithsakka/325/base 2025-12-04T09:17:18.5279864Z * [new branch] gh/laithsakka/325/head -> origin/gh/laithsakka/325/head 2025-12-04T09:17:18.5282188Z * [new branch] gh/laithsakka/325/orig -> origin/gh/laithsakka/325/orig 2025-12-04T09:17:18.5285143Z * [new branch] gh/laithsakka/326/base -> origin/gh/laithsakka/326/base 2025-12-04T09:17:18.5287024Z * [new branch] gh/laithsakka/326/head -> origin/gh/laithsakka/326/head 2025-12-04T09:17:18.5288914Z * [new branch] gh/laithsakka/326/orig -> origin/gh/laithsakka/326/orig 2025-12-04T09:17:18.5291518Z * [new branch] gh/laithsakka/327/base -> origin/gh/laithsakka/327/base 2025-12-04T09:17:18.5293441Z * [new branch] gh/laithsakka/327/head -> origin/gh/laithsakka/327/head 2025-12-04T09:17:18.5295283Z * [new branch] gh/laithsakka/327/orig -> origin/gh/laithsakka/327/orig 2025-12-04T09:17:18.5297825Z * [new branch] gh/laithsakka/328/base -> origin/gh/laithsakka/328/base 2025-12-04T09:17:18.5299762Z * [new branch] gh/laithsakka/328/head -> origin/gh/laithsakka/328/head 2025-12-04T09:17:18.5301801Z * [new branch] gh/laithsakka/328/orig -> origin/gh/laithsakka/328/orig 2025-12-04T09:17:18.5304826Z * [new branch] gh/liangel/4/base -> origin/gh/liangel/4/base 2025-12-04T09:17:18.5306712Z * [new branch] gh/liangel/4/head -> origin/gh/liangel/4/head 2025-12-04T09:17:18.5308934Z * [new branch] gh/liangel/4/orig -> origin/gh/liangel/4/orig 2025-12-04T09:17:18.5313960Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-12-04T09:17:18.5315536Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-12-04T09:17:18.5318627Z * [new branch] gh/lw/4/base -> origin/gh/lw/4/base 2025-12-04T09:17:18.5320506Z * [new branch] gh/lw/4/head -> origin/gh/lw/4/head 2025-12-04T09:17:18.5322464Z * [new branch] gh/lw/4/orig -> origin/gh/lw/4/orig 2025-12-04T09:17:18.5325162Z * [new branch] gh/lw/5/base -> origin/gh/lw/5/base 2025-12-04T09:17:18.5327012Z * [new branch] gh/lw/5/head -> origin/gh/lw/5/head 2025-12-04T09:17:18.5328874Z * [new branch] gh/lw/5/orig -> origin/gh/lw/5/orig 2025-12-04T09:17:18.5331399Z * [new branch] gh/lw/6/base -> origin/gh/lw/6/base 2025-12-04T09:17:18.5333388Z * [new branch] gh/lw/6/head -> origin/gh/lw/6/head 2025-12-04T09:17:18.5335086Z * [new branch] gh/lw/6/orig -> origin/gh/lw/6/orig 2025-12-04T09:17:18.5338090Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-12-04T09:17:18.5340793Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-12-04T09:17:18.5342523Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-12-04T09:17:18.5344572Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-12-04T09:17:18.5346802Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-12-04T09:17:18.5349108Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-12-04T09:17:18.5350644Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-12-04T09:17:18.5353200Z * [new branch] gh/malfet/517/base -> origin/gh/malfet/517/base 2025-12-04T09:17:18.5355025Z * [new branch] gh/malfet/517/head -> origin/gh/malfet/517/head 2025-12-04T09:17:18.5357541Z * [new branch] gh/malfet/528/base -> origin/gh/malfet/528/base 2025-12-04T09:17:18.5359291Z * [new branch] gh/malfet/528/head -> origin/gh/malfet/528/head 2025-12-04T09:17:18.5361080Z * [new branch] gh/malfet/528/orig -> origin/gh/malfet/528/orig 2025-12-04T09:17:18.5363578Z * [new branch] gh/malfet/537/base -> origin/gh/malfet/537/base 2025-12-04T09:17:18.5365552Z * [new branch] gh/malfet/537/head -> origin/gh/malfet/537/head 2025-12-04T09:17:18.5367561Z * [new branch] gh/malfet/537/orig -> origin/gh/malfet/537/orig 2025-12-04T09:17:18.5369893Z * [new branch] gh/malfet/546/base -> origin/gh/malfet/546/base 2025-12-04T09:17:18.5371605Z * [new branch] gh/malfet/546/head -> origin/gh/malfet/546/head 2025-12-04T09:17:18.5373401Z * [new branch] gh/malfet/546/orig -> origin/gh/malfet/546/orig 2025-12-04T09:17:18.5375935Z * [new branch] gh/malfet/565/base -> origin/gh/malfet/565/base 2025-12-04T09:17:18.5377874Z * [new branch] gh/malfet/565/head -> origin/gh/malfet/565/head 2025-12-04T09:17:18.5379891Z * [new branch] gh/malfet/565/orig -> origin/gh/malfet/565/orig 2025-12-04T09:17:18.5382463Z * [new branch] gh/malfet/575/base -> origin/gh/malfet/575/base 2025-12-04T09:17:18.5384258Z * [new branch] gh/malfet/575/head -> origin/gh/malfet/575/head 2025-12-04T09:17:18.5385999Z * [new branch] gh/malfet/575/orig -> origin/gh/malfet/575/orig 2025-12-04T09:17:18.5388570Z * [new branch] gh/malfet/580/base -> origin/gh/malfet/580/base 2025-12-04T09:17:18.5390417Z * [new branch] gh/malfet/580/head -> origin/gh/malfet/580/head 2025-12-04T09:17:18.5392198Z * [new branch] gh/malfet/580/orig -> origin/gh/malfet/580/orig 2025-12-04T09:17:18.5394655Z * [new branch] gh/malfet/581/base -> origin/gh/malfet/581/base 2025-12-04T09:17:18.5396520Z * [new branch] gh/malfet/581/head -> origin/gh/malfet/581/head 2025-12-04T09:17:18.5398415Z * [new branch] gh/malfet/581/orig -> origin/gh/malfet/581/orig 2025-12-04T09:17:18.5400869Z * [new branch] gh/malfet/583/base -> origin/gh/malfet/583/base 2025-12-04T09:17:18.5402691Z * [new branch] gh/malfet/583/head -> origin/gh/malfet/583/head 2025-12-04T09:17:18.5404466Z * [new branch] gh/malfet/583/orig -> origin/gh/malfet/583/orig 2025-12-04T09:17:18.5407397Z * [new branch] gh/malfet/586/base -> origin/gh/malfet/586/base 2025-12-04T09:17:18.5409634Z * [new branch] gh/malfet/586/head -> origin/gh/malfet/586/head 2025-12-04T09:17:18.5411415Z * [new branch] gh/malfet/586/orig -> origin/gh/malfet/586/orig 2025-12-04T09:17:18.5413851Z * [new branch] gh/malfet/587/base -> origin/gh/malfet/587/base 2025-12-04T09:17:18.5415710Z * [new branch] gh/malfet/587/head -> origin/gh/malfet/587/head 2025-12-04T09:17:18.5417588Z * [new branch] gh/malfet/587/orig -> origin/gh/malfet/587/orig 2025-12-04T09:17:18.5420200Z * [new branch] gh/malfet/588/base -> origin/gh/malfet/588/base 2025-12-04T09:17:18.5421975Z * [new branch] gh/malfet/588/head -> origin/gh/malfet/588/head 2025-12-04T09:17:18.5424380Z * [new branch] gh/malfet/588/orig -> origin/gh/malfet/588/orig 2025-12-04T09:17:18.5427137Z * [new branch] gh/malfet/589/base -> origin/gh/malfet/589/base 2025-12-04T09:17:18.5429031Z * [new branch] gh/malfet/589/head -> origin/gh/malfet/589/head 2025-12-04T09:17:18.5431012Z * [new branch] gh/malfet/589/orig -> origin/gh/malfet/589/orig 2025-12-04T09:17:18.5433369Z * [new branch] gh/malfet/590/base -> origin/gh/malfet/590/base 2025-12-04T09:17:18.5435295Z * [new branch] gh/malfet/590/head -> origin/gh/malfet/590/head 2025-12-04T09:17:18.5437583Z * [new branch] gh/malfet/590/orig -> origin/gh/malfet/590/orig 2025-12-04T09:17:18.5440653Z * [new branch] gh/malfet/591/base -> origin/gh/malfet/591/base 2025-12-04T09:17:18.5442485Z * [new branch] gh/malfet/591/head -> origin/gh/malfet/591/head 2025-12-04T09:17:18.5444304Z * [new branch] gh/malfet/591/orig -> origin/gh/malfet/591/orig 2025-12-04T09:17:18.5446985Z * [new branch] gh/malfet/592/base -> origin/gh/malfet/592/base 2025-12-04T09:17:18.5448856Z * [new branch] gh/malfet/592/head -> origin/gh/malfet/592/head 2025-12-04T09:17:18.5450856Z * [new branch] gh/malfet/592/orig -> origin/gh/malfet/592/orig 2025-12-04T09:17:18.5453436Z * [new branch] gh/malfet/593/base -> origin/gh/malfet/593/base 2025-12-04T09:17:18.5455286Z * [new branch] gh/malfet/593/head -> origin/gh/malfet/593/head 2025-12-04T09:17:18.5457213Z * [new branch] gh/malfet/593/orig -> origin/gh/malfet/593/orig 2025-12-04T09:17:18.5460011Z * [new branch] gh/malfet/594/base -> origin/gh/malfet/594/base 2025-12-04T09:17:18.5461742Z * [new branch] gh/malfet/594/head -> origin/gh/malfet/594/head 2025-12-04T09:17:18.5463572Z * [new branch] gh/malfet/594/orig -> origin/gh/malfet/594/orig 2025-12-04T09:17:18.5466299Z * [new branch] gh/malfet/595/base -> origin/gh/malfet/595/base 2025-12-04T09:17:18.5468033Z * [new branch] gh/malfet/595/head -> origin/gh/malfet/595/head 2025-12-04T09:17:18.5469872Z * [new branch] gh/malfet/595/orig -> origin/gh/malfet/595/orig 2025-12-04T09:17:18.5472462Z * [new branch] gh/malfet/596/base -> origin/gh/malfet/596/base 2025-12-04T09:17:18.5474349Z * [new branch] gh/malfet/596/head -> origin/gh/malfet/596/head 2025-12-04T09:17:18.5476346Z * [new branch] gh/malfet/596/orig -> origin/gh/malfet/596/orig 2025-12-04T09:17:18.5478968Z * [new branch] gh/malfet/597/base -> origin/gh/malfet/597/base 2025-12-04T09:17:18.5480784Z * [new branch] gh/malfet/597/head -> origin/gh/malfet/597/head 2025-12-04T09:17:18.5482726Z * [new branch] gh/malfet/597/orig -> origin/gh/malfet/597/orig 2025-12-04T09:17:18.5485230Z * [new branch] gh/malfet/598/base -> origin/gh/malfet/598/base 2025-12-04T09:17:18.5487837Z * [new branch] gh/malfet/598/head -> origin/gh/malfet/598/head 2025-12-04T09:17:18.5489386Z * [new branch] gh/malfet/598/orig -> origin/gh/malfet/598/orig 2025-12-04T09:17:18.5492165Z * [new branch] gh/malfet/599/base -> origin/gh/malfet/599/base 2025-12-04T09:17:18.5493976Z * [new branch] gh/malfet/599/head -> origin/gh/malfet/599/head 2025-12-04T09:17:18.5495876Z * [new branch] gh/malfet/599/orig -> origin/gh/malfet/599/orig 2025-12-04T09:17:18.5498725Z * [new branch] gh/malfet/600/base -> origin/gh/malfet/600/base 2025-12-04T09:17:18.5500741Z * [new branch] gh/malfet/600/head -> origin/gh/malfet/600/head 2025-12-04T09:17:18.5502678Z * [new branch] gh/malfet/600/orig -> origin/gh/malfet/600/orig 2025-12-04T09:17:18.5505217Z * [new branch] gh/malfet/601/base -> origin/gh/malfet/601/base 2025-12-04T09:17:18.5506995Z * [new branch] gh/malfet/601/head -> origin/gh/malfet/601/head 2025-12-04T09:17:18.5509000Z * [new branch] gh/malfet/601/orig -> origin/gh/malfet/601/orig 2025-12-04T09:17:18.5511834Z * [new branch] gh/malfet/602/base -> origin/gh/malfet/602/base 2025-12-04T09:17:18.5513585Z * [new branch] gh/malfet/602/head -> origin/gh/malfet/602/head 2025-12-04T09:17:18.5515672Z * [new branch] gh/malfet/602/orig -> origin/gh/malfet/602/orig 2025-12-04T09:17:18.5518012Z * [new branch] gh/malfet/603/base -> origin/gh/malfet/603/base 2025-12-04T09:17:18.5519599Z * [new branch] gh/malfet/603/head -> origin/gh/malfet/603/head 2025-12-04T09:17:18.5521499Z * [new branch] gh/malfet/603/orig -> origin/gh/malfet/603/orig 2025-12-04T09:17:18.5524051Z * [new branch] gh/malfet/604/base -> origin/gh/malfet/604/base 2025-12-04T09:17:18.5525861Z * [new branch] gh/malfet/604/head -> origin/gh/malfet/604/head 2025-12-04T09:17:18.5527754Z * [new branch] gh/malfet/604/orig -> origin/gh/malfet/604/orig 2025-12-04T09:17:18.5530521Z * [new branch] gh/malfet/605/base -> origin/gh/malfet/605/base 2025-12-04T09:17:18.5532280Z * [new branch] gh/malfet/605/head -> origin/gh/malfet/605/head 2025-12-04T09:17:18.5534054Z * [new branch] gh/malfet/605/orig -> origin/gh/malfet/605/orig 2025-12-04T09:17:18.5536644Z * [new branch] gh/malfet/606/base -> origin/gh/malfet/606/base 2025-12-04T09:17:18.5538567Z * [new branch] gh/malfet/606/head -> origin/gh/malfet/606/head 2025-12-04T09:17:18.5540946Z * [new branch] gh/malfet/606/orig -> origin/gh/malfet/606/orig 2025-12-04T09:17:18.5543608Z * [new branch] gh/malfet/607/base -> origin/gh/malfet/607/base 2025-12-04T09:17:18.5545160Z * [new branch] gh/malfet/607/head -> origin/gh/malfet/607/head 2025-12-04T09:17:18.5547106Z * [new branch] gh/malfet/607/orig -> origin/gh/malfet/607/orig 2025-12-04T09:17:18.5549739Z * [new branch] gh/malfet/608/base -> origin/gh/malfet/608/base 2025-12-04T09:17:18.5551551Z * [new branch] gh/malfet/608/head -> origin/gh/malfet/608/head 2025-12-04T09:17:18.5553488Z * [new branch] gh/malfet/608/orig -> origin/gh/malfet/608/orig 2025-12-04T09:17:18.5556146Z * [new branch] gh/malfet/609/base -> origin/gh/malfet/609/base 2025-12-04T09:17:18.5557971Z * [new branch] gh/malfet/609/head -> origin/gh/malfet/609/head 2025-12-04T09:17:18.5559919Z * [new branch] gh/malfet/609/orig -> origin/gh/malfet/609/orig 2025-12-04T09:17:18.5562625Z * [new branch] gh/malfet/610/base -> origin/gh/malfet/610/base 2025-12-04T09:17:18.5564517Z * [new branch] gh/malfet/610/head -> origin/gh/malfet/610/head 2025-12-04T09:17:18.5566303Z * [new branch] gh/malfet/610/orig -> origin/gh/malfet/610/orig 2025-12-04T09:17:18.5568911Z * [new branch] gh/malfet/611/base -> origin/gh/malfet/611/base 2025-12-04T09:17:18.5570688Z * [new branch] gh/malfet/611/head -> origin/gh/malfet/611/head 2025-12-04T09:17:18.5573239Z * [new branch] gh/malfet/611/orig -> origin/gh/malfet/611/orig 2025-12-04T09:17:18.5575648Z * [new branch] gh/malfet/612/base -> origin/gh/malfet/612/base 2025-12-04T09:17:18.5577538Z * [new branch] gh/malfet/612/head -> origin/gh/malfet/612/head 2025-12-04T09:17:18.5579870Z * [new branch] gh/malfet/612/orig -> origin/gh/malfet/612/orig 2025-12-04T09:17:18.5582408Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-12-04T09:17:18.5584176Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-12-04T09:17:18.5587310Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-12-04T09:17:18.5589109Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-12-04T09:17:18.5590997Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-12-04T09:17:18.5594314Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-12-04T09:17:18.5597466Z * [new branch] gh/masnesral/1/base -> origin/gh/masnesral/1/base 2025-12-04T09:17:18.5599294Z * [new branch] gh/masnesral/1/head -> origin/gh/masnesral/1/head 2025-12-04T09:17:18.5601514Z * [new branch] gh/masnesral/1/orig -> origin/gh/masnesral/1/orig 2025-12-04T09:17:18.5604722Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-12-04T09:17:18.5606361Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-12-04T09:17:18.5608842Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-12-04T09:17:18.5613238Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-12-04T09:17:18.5615705Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-12-04T09:17:18.5617517Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-12-04T09:17:18.5620080Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-12-04T09:17:18.5621961Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-12-04T09:17:18.5624377Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-12-04T09:17:18.5626257Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-12-04T09:17:18.5628618Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-12-04T09:17:18.5630255Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-12-04T09:17:18.5632656Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-12-04T09:17:18.5634363Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-12-04T09:17:18.5637548Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-12-04T09:17:18.5639345Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-12-04T09:17:18.5641812Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-12-04T09:17:18.5643849Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-12-04T09:17:18.5646046Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-12-04T09:17:18.5647914Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-12-04T09:17:18.5650335Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-12-04T09:17:18.5652068Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-12-04T09:17:18.5654565Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-12-04T09:17:18.5656396Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-12-04T09:17:18.5658917Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-12-04T09:17:18.5661081Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-12-04T09:17:18.5662872Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-12-04T09:17:18.5665522Z * [new branch] gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base 2025-12-04T09:17:18.5667334Z * [new branch] gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head 2025-12-04T09:17:18.5669169Z * [new branch] gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig 2025-12-04T09:17:18.5671974Z * [new branch] gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base 2025-12-04T09:17:18.5674386Z * [new branch] gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head 2025-12-04T09:17:18.5676351Z * [new branch] gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig 2025-12-04T09:17:18.5678988Z * [new branch] gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base 2025-12-04T09:17:18.5680782Z * [new branch] gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head 2025-12-04T09:17:18.5682704Z * [new branch] gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig 2025-12-04T09:17:18.5685371Z * [new branch] gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base 2025-12-04T09:17:18.5687227Z * [new branch] gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head 2025-12-04T09:17:18.5689168Z * [new branch] gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig 2025-12-04T09:17:18.5691725Z * [new branch] gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base 2025-12-04T09:17:18.5693492Z * [new branch] gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head 2025-12-04T09:17:18.5695352Z * [new branch] gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig 2025-12-04T09:17:18.5698089Z * [new branch] gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base 2025-12-04T09:17:18.5700140Z * [new branch] gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head 2025-12-04T09:17:18.5701951Z * [new branch] gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig 2025-12-04T09:17:18.5704964Z * [new branch] gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base 2025-12-04T09:17:18.5706855Z * [new branch] gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head 2025-12-04T09:17:18.5709048Z * [new branch] gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig 2025-12-04T09:17:18.5714052Z * [new branch] gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base 2025-12-04T09:17:18.5716254Z * [new branch] gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head 2025-12-04T09:17:18.5718256Z * [new branch] gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig 2025-12-04T09:17:18.5721041Z * [new branch] gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base 2025-12-04T09:17:18.5723089Z * [new branch] gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head 2025-12-04T09:17:18.5725100Z * [new branch] gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig 2025-12-04T09:17:18.5727458Z * [new branch] gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base 2025-12-04T09:17:18.5729311Z * [new branch] gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head 2025-12-04T09:17:18.5731216Z * [new branch] gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig 2025-12-04T09:17:18.5734383Z * [new branch] gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base 2025-12-04T09:17:18.5736347Z * [new branch] gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head 2025-12-04T09:17:18.5738193Z * [new branch] gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig 2025-12-04T09:17:18.5740857Z * [new branch] gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base 2025-12-04T09:17:18.5742808Z * [new branch] gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head 2025-12-04T09:17:18.5744755Z * [new branch] gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig 2025-12-04T09:17:18.5747389Z * [new branch] gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base 2025-12-04T09:17:18.5749299Z * [new branch] gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head 2025-12-04T09:17:18.5751179Z * [new branch] gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig 2025-12-04T09:17:18.5753774Z * [new branch] gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base 2025-12-04T09:17:18.5755719Z * [new branch] gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head 2025-12-04T09:17:18.5757515Z * [new branch] gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig 2025-12-04T09:17:18.5760420Z * [new branch] gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base 2025-12-04T09:17:18.5762418Z * [new branch] gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head 2025-12-04T09:17:18.5764185Z * [new branch] gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig 2025-12-04T09:17:18.5766829Z * [new branch] gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base 2025-12-04T09:17:18.5768921Z * [new branch] gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head 2025-12-04T09:17:18.5770794Z * [new branch] gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig 2025-12-04T09:17:18.5773738Z * [new branch] gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base 2025-12-04T09:17:18.5775791Z * [new branch] gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head 2025-12-04T09:17:18.5777669Z * [new branch] gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig 2025-12-04T09:17:18.5780937Z * [new branch] gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base 2025-12-04T09:17:18.5782729Z * [new branch] gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head 2025-12-04T09:17:18.5784563Z * [new branch] gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig 2025-12-04T09:17:18.5787416Z * [new branch] gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base 2025-12-04T09:17:18.5789316Z * [new branch] gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head 2025-12-04T09:17:18.5791233Z * [new branch] gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig 2025-12-04T09:17:18.5793893Z * [new branch] gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base 2025-12-04T09:17:18.5795823Z * [new branch] gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head 2025-12-04T09:17:18.5797616Z * [new branch] gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig 2025-12-04T09:17:18.5800184Z * [new branch] gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base 2025-12-04T09:17:18.5802077Z * [new branch] gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head 2025-12-04T09:17:18.5803882Z * [new branch] gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig 2025-12-04T09:17:18.5806570Z * [new branch] gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base 2025-12-04T09:17:18.5808726Z * [new branch] gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head 2025-12-04T09:17:18.5812395Z * [new branch] gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig 2025-12-04T09:17:18.5815119Z * [new branch] gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base 2025-12-04T09:17:18.5817105Z * [new branch] gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head 2025-12-04T09:17:18.5818877Z * [new branch] gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig 2025-12-04T09:17:18.5821914Z * [new branch] gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base 2025-12-04T09:17:18.5823990Z * [new branch] gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head 2025-12-04T09:17:18.5826001Z * [new branch] gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig 2025-12-04T09:17:18.5828597Z * [new branch] gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base 2025-12-04T09:17:18.5830342Z * [new branch] gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head 2025-12-04T09:17:18.5832119Z * [new branch] gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig 2025-12-04T09:17:18.5834831Z * [new branch] gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base 2025-12-04T09:17:18.5836829Z * [new branch] gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head 2025-12-04T09:17:18.5838496Z * [new branch] gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig 2025-12-04T09:17:18.5840999Z * [new branch] gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base 2025-12-04T09:17:18.5843173Z * [new branch] gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head 2025-12-04T09:17:18.5845057Z * [new branch] gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig 2025-12-04T09:17:18.5847793Z * [new branch] gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base 2025-12-04T09:17:18.5849624Z * [new branch] gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head 2025-12-04T09:17:18.5851503Z * [new branch] gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig 2025-12-04T09:17:18.5854066Z * [new branch] gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base 2025-12-04T09:17:18.5856023Z * [new branch] gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head 2025-12-04T09:17:18.5857889Z * [new branch] gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig 2025-12-04T09:17:18.5860633Z * [new branch] gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base 2025-12-04T09:17:18.5862772Z * [new branch] gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head 2025-12-04T09:17:18.5864635Z * [new branch] gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig 2025-12-04T09:17:18.5867640Z * [new branch] gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base 2025-12-04T09:17:18.5869556Z * [new branch] gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head 2025-12-04T09:17:18.5871426Z * [new branch] gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig 2025-12-04T09:17:18.5874092Z * [new branch] gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base 2025-12-04T09:17:18.5875973Z * [new branch] gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head 2025-12-04T09:17:18.5877813Z * [new branch] gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig 2025-12-04T09:17:18.5880363Z * [new branch] gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base 2025-12-04T09:17:18.5882236Z * [new branch] gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head 2025-12-04T09:17:18.5884064Z * [new branch] gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig 2025-12-04T09:17:18.5886515Z * [new branch] gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base 2025-12-04T09:17:18.5888517Z * [new branch] gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head 2025-12-04T09:17:18.5890247Z * [new branch] gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig 2025-12-04T09:17:18.5893085Z * [new branch] gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base 2025-12-04T09:17:18.5894952Z * [new branch] gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head 2025-12-04T09:17:18.5896835Z * [new branch] gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig 2025-12-04T09:17:18.5899318Z * [new branch] gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base 2025-12-04T09:17:18.5901246Z * [new branch] gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head 2025-12-04T09:17:18.5903229Z * [new branch] gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig 2025-12-04T09:17:18.5905755Z * [new branch] gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base 2025-12-04T09:17:18.5907664Z * [new branch] gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head 2025-12-04T09:17:18.5909719Z * [new branch] gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig 2025-12-04T09:17:18.5912255Z * [new branch] gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base 2025-12-04T09:17:18.5914104Z * [new branch] gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head 2025-12-04T09:17:18.5915952Z * [new branch] gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig 2025-12-04T09:17:18.5918480Z * [new branch] gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base 2025-12-04T09:17:18.5920545Z * [new branch] gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head 2025-12-04T09:17:18.5922214Z * [new branch] gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig 2025-12-04T09:17:18.5924974Z * [new branch] gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base 2025-12-04T09:17:18.5926846Z * [new branch] gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head 2025-12-04T09:17:18.5928587Z * [new branch] gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig 2025-12-04T09:17:18.5931331Z * [new branch] gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base 2025-12-04T09:17:18.5932962Z * [new branch] gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head 2025-12-04T09:17:18.5934825Z * [new branch] gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig 2025-12-04T09:17:18.5937305Z * [new branch] gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base 2025-12-04T09:17:18.5939749Z * [new branch] gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head 2025-12-04T09:17:18.5941663Z * [new branch] gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig 2025-12-04T09:17:18.5944491Z * [new branch] gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base 2025-12-04T09:17:18.5946491Z * [new branch] gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head 2025-12-04T09:17:18.5948450Z * [new branch] gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig 2025-12-04T09:17:18.5951113Z * [new branch] gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base 2025-12-04T09:17:18.5952999Z * [new branch] gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head 2025-12-04T09:17:18.5954683Z * [new branch] gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig 2025-12-04T09:17:18.5957545Z * [new branch] gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base 2025-12-04T09:17:18.5959423Z * [new branch] gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head 2025-12-04T09:17:18.5961235Z * [new branch] gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig 2025-12-04T09:17:18.5963804Z * [new branch] gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base 2025-12-04T09:17:18.5965921Z * [new branch] gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head 2025-12-04T09:17:18.5967582Z * [new branch] gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig 2025-12-04T09:17:18.5970632Z * [new branch] gh/mlazos/41/base -> origin/gh/mlazos/41/base 2025-12-04T09:17:18.5972447Z * [new branch] gh/mlazos/41/head -> origin/gh/mlazos/41/head 2025-12-04T09:17:18.5974286Z * [new branch] gh/mlazos/41/orig -> origin/gh/mlazos/41/orig 2025-12-04T09:17:18.5976876Z * [new branch] gh/mlazos/42/base -> origin/gh/mlazos/42/base 2025-12-04T09:17:18.5978685Z * [new branch] gh/mlazos/42/head -> origin/gh/mlazos/42/head 2025-12-04T09:17:18.5980618Z * [new branch] gh/mlazos/42/orig -> origin/gh/mlazos/42/orig 2025-12-04T09:17:18.5983008Z * [new branch] gh/mlazos/43/base -> origin/gh/mlazos/43/base 2025-12-04T09:17:18.5984796Z * [new branch] gh/mlazos/43/head -> origin/gh/mlazos/43/head 2025-12-04T09:17:18.5986687Z * [new branch] gh/mlazos/43/orig -> origin/gh/mlazos/43/orig 2025-12-04T09:17:18.5989035Z * [new branch] gh/mlazos/44/base -> origin/gh/mlazos/44/base 2025-12-04T09:17:18.5990836Z * [new branch] gh/mlazos/44/head -> origin/gh/mlazos/44/head 2025-12-04T09:17:18.5992643Z * [new branch] gh/mlazos/44/orig -> origin/gh/mlazos/44/orig 2025-12-04T09:17:18.5995010Z * [new branch] gh/mlazos/47/base -> origin/gh/mlazos/47/base 2025-12-04T09:17:18.5996965Z * [new branch] gh/mlazos/47/head -> origin/gh/mlazos/47/head 2025-12-04T09:17:18.5998758Z * [new branch] gh/mlazos/47/orig -> origin/gh/mlazos/47/orig 2025-12-04T09:17:18.6001257Z * [new branch] gh/mlazos/48/base -> origin/gh/mlazos/48/base 2025-12-04T09:17:18.6003278Z * [new branch] gh/mlazos/48/head -> origin/gh/mlazos/48/head 2025-12-04T09:17:18.6005341Z * [new branch] gh/mlazos/48/orig -> origin/gh/mlazos/48/orig 2025-12-04T09:17:18.6007556Z * [new branch] gh/mlazos/49/base -> origin/gh/mlazos/49/base 2025-12-04T09:17:18.6009559Z * [new branch] gh/mlazos/49/head -> origin/gh/mlazos/49/head 2025-12-04T09:17:18.6011713Z * [new branch] gh/mlazos/49/orig -> origin/gh/mlazos/49/orig 2025-12-04T09:17:18.6013903Z * [new branch] gh/mlazos/50/base -> origin/gh/mlazos/50/base 2025-12-04T09:17:18.6015634Z * [new branch] gh/mlazos/50/head -> origin/gh/mlazos/50/head 2025-12-04T09:17:18.6017496Z * [new branch] gh/mlazos/50/orig -> origin/gh/mlazos/50/orig 2025-12-04T09:17:18.6020537Z * [new branch] gh/mlazos/51/base -> origin/gh/mlazos/51/base 2025-12-04T09:17:18.6022366Z * [new branch] gh/mlazos/51/head -> origin/gh/mlazos/51/head 2025-12-04T09:17:18.6024149Z * [new branch] gh/mlazos/51/orig -> origin/gh/mlazos/51/orig 2025-12-04T09:17:18.6026746Z * [new branch] gh/mlazos/52/base -> origin/gh/mlazos/52/base 2025-12-04T09:17:18.6028577Z * [new branch] gh/mlazos/52/head -> origin/gh/mlazos/52/head 2025-12-04T09:17:18.6030865Z * [new branch] gh/mlazos/52/orig -> origin/gh/mlazos/52/orig 2025-12-04T09:17:18.6033350Z * [new branch] gh/mlazos/53/base -> origin/gh/mlazos/53/base 2025-12-04T09:17:18.6035135Z * [new branch] gh/mlazos/53/head -> origin/gh/mlazos/53/head 2025-12-04T09:17:18.6036935Z * [new branch] gh/mlazos/53/orig -> origin/gh/mlazos/53/orig 2025-12-04T09:17:18.6039324Z * [new branch] gh/mlazos/54/base -> origin/gh/mlazos/54/base 2025-12-04T09:17:18.6041297Z * [new branch] gh/mlazos/54/head -> origin/gh/mlazos/54/head 2025-12-04T09:17:18.6043135Z * [new branch] gh/mlazos/54/orig -> origin/gh/mlazos/54/orig 2025-12-04T09:17:18.6045568Z * [new branch] gh/mlazos/55/base -> origin/gh/mlazos/55/base 2025-12-04T09:17:18.6047383Z * [new branch] gh/mlazos/55/head -> origin/gh/mlazos/55/head 2025-12-04T09:17:18.6049158Z * [new branch] gh/mlazos/55/orig -> origin/gh/mlazos/55/orig 2025-12-04T09:17:18.6051735Z * [new branch] gh/mlazos/56/base -> origin/gh/mlazos/56/base 2025-12-04T09:17:18.6053628Z * [new branch] gh/mlazos/56/head -> origin/gh/mlazos/56/head 2025-12-04T09:17:18.6055486Z * [new branch] gh/mlazos/56/orig -> origin/gh/mlazos/56/orig 2025-12-04T09:17:18.6057940Z * [new branch] gh/mlazos/57/base -> origin/gh/mlazos/57/base 2025-12-04T09:17:18.6059897Z * [new branch] gh/mlazos/57/head -> origin/gh/mlazos/57/head 2025-12-04T09:17:18.6061696Z * [new branch] gh/mlazos/57/orig -> origin/gh/mlazos/57/orig 2025-12-04T09:17:18.6064839Z * [new branch] gh/mlazos/58/base -> origin/gh/mlazos/58/base 2025-12-04T09:17:18.6067185Z * [new branch] gh/mlazos/58/head -> origin/gh/mlazos/58/head 2025-12-04T09:17:18.6069021Z * [new branch] gh/mlazos/58/orig -> origin/gh/mlazos/58/orig 2025-12-04T09:17:18.6071567Z * [new branch] gh/mlazos/59/base -> origin/gh/mlazos/59/base 2025-12-04T09:17:18.6073395Z * [new branch] gh/mlazos/59/head -> origin/gh/mlazos/59/head 2025-12-04T09:17:18.6075204Z * [new branch] gh/mlazos/59/orig -> origin/gh/mlazos/59/orig 2025-12-04T09:17:18.6077837Z * [new branch] gh/mlazos/60/base -> origin/gh/mlazos/60/base 2025-12-04T09:17:18.6079803Z * [new branch] gh/mlazos/60/head -> origin/gh/mlazos/60/head 2025-12-04T09:17:18.6081470Z * [new branch] gh/mlazos/60/orig -> origin/gh/mlazos/60/orig 2025-12-04T09:17:18.6084477Z * [new branch] gh/mlazos/61/base -> origin/gh/mlazos/61/base 2025-12-04T09:17:18.6086331Z * [new branch] gh/mlazos/61/head -> origin/gh/mlazos/61/head 2025-12-04T09:17:18.6088124Z * [new branch] gh/mlazos/61/orig -> origin/gh/mlazos/61/orig 2025-12-04T09:17:18.6090695Z * [new branch] gh/mlazos/62/base -> origin/gh/mlazos/62/base 2025-12-04T09:17:18.6092944Z * [new branch] gh/mlazos/62/head -> origin/gh/mlazos/62/head 2025-12-04T09:17:18.6094780Z * [new branch] gh/mlazos/62/orig -> origin/gh/mlazos/62/orig 2025-12-04T09:17:18.6097341Z * [new branch] gh/mlazos/63/base -> origin/gh/mlazos/63/base 2025-12-04T09:17:18.6100095Z * [new branch] gh/mlazos/63/head -> origin/gh/mlazos/63/head 2025-12-04T09:17:18.6101912Z * [new branch] gh/mlazos/63/orig -> origin/gh/mlazos/63/orig 2025-12-04T09:17:18.6104489Z * [new branch] gh/mlazos/64/base -> origin/gh/mlazos/64/base 2025-12-04T09:17:18.6106368Z * [new branch] gh/mlazos/64/head -> origin/gh/mlazos/64/head 2025-12-04T09:17:18.6108298Z * [new branch] gh/mlazos/64/orig -> origin/gh/mlazos/64/orig 2025-12-04T09:17:18.6110960Z * [new branch] gh/mlazos/65/base -> origin/gh/mlazos/65/base 2025-12-04T09:17:18.6112791Z * [new branch] gh/mlazos/65/head -> origin/gh/mlazos/65/head 2025-12-04T09:17:18.6114573Z * [new branch] gh/mlazos/65/orig -> origin/gh/mlazos/65/orig 2025-12-04T09:17:18.6117190Z * [new branch] gh/mlazos/66/base -> origin/gh/mlazos/66/base 2025-12-04T09:17:18.6118981Z * [new branch] gh/mlazos/66/head -> origin/gh/mlazos/66/head 2025-12-04T09:17:18.6120767Z * [new branch] gh/mlazos/66/orig -> origin/gh/mlazos/66/orig 2025-12-04T09:17:18.6129719Z * [new branch] gh/mlazos/67/base -> origin/gh/mlazos/67/base 2025-12-04T09:17:18.6130048Z * [new branch] gh/mlazos/67/head -> origin/gh/mlazos/67/head 2025-12-04T09:17:18.6130354Z * [new branch] gh/mlazos/67/orig -> origin/gh/mlazos/67/orig 2025-12-04T09:17:18.6130556Z * [new branch] gh/mlazos/68/base -> origin/gh/mlazos/68/base 2025-12-04T09:17:18.6131359Z * [new branch] gh/mlazos/68/head -> origin/gh/mlazos/68/head 2025-12-04T09:17:18.6133356Z * [new branch] gh/mlazos/68/orig -> origin/gh/mlazos/68/orig 2025-12-04T09:17:18.6135868Z * [new branch] gh/mlazos/69/base -> origin/gh/mlazos/69/base 2025-12-04T09:17:18.6137849Z * [new branch] gh/mlazos/69/head -> origin/gh/mlazos/69/head 2025-12-04T09:17:18.6139631Z * [new branch] gh/mlazos/69/orig -> origin/gh/mlazos/69/orig 2025-12-04T09:17:18.6142269Z * [new branch] gh/mlazos/70/base -> origin/gh/mlazos/70/base 2025-12-04T09:17:18.6144085Z * [new branch] gh/mlazos/70/head -> origin/gh/mlazos/70/head 2025-12-04T09:17:18.6145922Z * [new branch] gh/mlazos/70/orig -> origin/gh/mlazos/70/orig 2025-12-04T09:17:18.6148488Z * [new branch] gh/mlazos/71/base -> origin/gh/mlazos/71/base 2025-12-04T09:17:18.6150420Z * [new branch] gh/mlazos/71/head -> origin/gh/mlazos/71/head 2025-12-04T09:17:18.6152038Z * [new branch] gh/mlazos/71/orig -> origin/gh/mlazos/71/orig 2025-12-04T09:17:18.6154684Z * [new branch] gh/mlazos/72/base -> origin/gh/mlazos/72/base 2025-12-04T09:17:18.6156696Z * [new branch] gh/mlazos/72/head -> origin/gh/mlazos/72/head 2025-12-04T09:17:18.6158311Z * [new branch] gh/mlazos/72/orig -> origin/gh/mlazos/72/orig 2025-12-04T09:17:18.6161140Z * [new branch] gh/mlazos/73/base -> origin/gh/mlazos/73/base 2025-12-04T09:17:18.6162888Z * [new branch] gh/mlazos/73/head -> origin/gh/mlazos/73/head 2025-12-04T09:17:18.6164707Z * [new branch] gh/mlazos/73/orig -> origin/gh/mlazos/73/orig 2025-12-04T09:17:18.6167828Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-12-04T09:17:18.6169763Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-12-04T09:17:18.6172897Z * [new branch] gh/muchulee8/73/base -> origin/gh/muchulee8/73/base 2025-12-04T09:17:18.6174875Z * [new branch] gh/muchulee8/73/head -> origin/gh/muchulee8/73/head 2025-12-04T09:17:18.6176842Z * [new branch] gh/muchulee8/73/orig -> origin/gh/muchulee8/73/orig 2025-12-04T09:17:18.6180168Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-12-04T09:17:18.6182036Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-12-04T09:17:18.6183986Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-12-04T09:17:18.6186473Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-12-04T09:17:18.6188308Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-12-04T09:17:18.6190168Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-12-04T09:17:18.6192702Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-12-04T09:17:18.6194464Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-12-04T09:17:18.6196489Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-12-04T09:17:18.6198911Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-12-04T09:17:18.6200787Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-12-04T09:17:18.6202824Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-12-04T09:17:18.6205302Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-12-04T09:17:18.6207143Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-12-04T09:17:18.6211902Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-12-04T09:17:18.6213904Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-12-04T09:17:18.6214160Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-12-04T09:17:18.6214927Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-12-04T09:17:18.6217741Z * [new branch] gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base 2025-12-04T09:17:18.6219544Z * [new branch] gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head 2025-12-04T09:17:18.6221370Z * [new branch] gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig 2025-12-04T09:17:18.6223699Z * [new branch] gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base 2025-12-04T09:17:18.6225633Z * [new branch] gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head 2025-12-04T09:17:18.6227635Z * [new branch] gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig 2025-12-04T09:17:18.6230857Z * [new branch] gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base 2025-12-04T09:17:18.6232406Z * [new branch] gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head 2025-12-04T09:17:18.6234347Z * [new branch] gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig 2025-12-04T09:17:18.6237461Z * [new branch] gh/nikitaved/1/base -> origin/gh/nikitaved/1/base 2025-12-04T09:17:18.6239617Z * [new branch] gh/nikitaved/1/head -> origin/gh/nikitaved/1/head 2025-12-04T09:17:18.6241526Z * [new branch] gh/nikitaved/1/orig -> origin/gh/nikitaved/1/orig 2025-12-04T09:17:18.6244090Z * [new branch] gh/nikitaved/10/base -> origin/gh/nikitaved/10/base 2025-12-04T09:17:18.6245877Z * [new branch] gh/nikitaved/10/head -> origin/gh/nikitaved/10/head 2025-12-04T09:17:18.6247769Z * [new branch] gh/nikitaved/10/orig -> origin/gh/nikitaved/10/orig 2025-12-04T09:17:18.6250240Z * [new branch] gh/nikitaved/11/base -> origin/gh/nikitaved/11/base 2025-12-04T09:17:18.6252164Z * [new branch] gh/nikitaved/11/head -> origin/gh/nikitaved/11/head 2025-12-04T09:17:18.6254632Z * [new branch] gh/nikitaved/11/orig -> origin/gh/nikitaved/11/orig 2025-12-04T09:17:18.6257512Z * [new branch] gh/nikitaved/12/base -> origin/gh/nikitaved/12/base 2025-12-04T09:17:18.6259500Z * [new branch] gh/nikitaved/12/head -> origin/gh/nikitaved/12/head 2025-12-04T09:17:18.6261295Z * [new branch] gh/nikitaved/12/orig -> origin/gh/nikitaved/12/orig 2025-12-04T09:17:18.6263837Z * [new branch] gh/nikitaved/13/base -> origin/gh/nikitaved/13/base 2025-12-04T09:17:18.6265780Z * [new branch] gh/nikitaved/13/head -> origin/gh/nikitaved/13/head 2025-12-04T09:17:18.6267624Z * [new branch] gh/nikitaved/13/orig -> origin/gh/nikitaved/13/orig 2025-12-04T09:17:18.6270217Z * [new branch] gh/nikitaved/14/base -> origin/gh/nikitaved/14/base 2025-12-04T09:17:18.6272004Z * [new branch] gh/nikitaved/14/head -> origin/gh/nikitaved/14/head 2025-12-04T09:17:18.6274368Z * [new branch] gh/nikitaved/14/orig -> origin/gh/nikitaved/14/orig 2025-12-04T09:17:18.6276739Z * [new branch] gh/nikitaved/15/base -> origin/gh/nikitaved/15/base 2025-12-04T09:17:18.6278652Z * [new branch] gh/nikitaved/15/head -> origin/gh/nikitaved/15/head 2025-12-04T09:17:18.6280472Z * [new branch] gh/nikitaved/15/orig -> origin/gh/nikitaved/15/orig 2025-12-04T09:17:18.6282938Z * [new branch] gh/nikitaved/16/base -> origin/gh/nikitaved/16/base 2025-12-04T09:17:18.6284811Z * [new branch] gh/nikitaved/16/head -> origin/gh/nikitaved/16/head 2025-12-04T09:17:18.6286547Z * [new branch] gh/nikitaved/16/orig -> origin/gh/nikitaved/16/orig 2025-12-04T09:17:18.6289061Z * [new branch] gh/nikitaved/2/base -> origin/gh/nikitaved/2/base 2025-12-04T09:17:18.6290952Z * [new branch] gh/nikitaved/2/head -> origin/gh/nikitaved/2/head 2025-12-04T09:17:18.6292731Z * [new branch] gh/nikitaved/2/orig -> origin/gh/nikitaved/2/orig 2025-12-04T09:17:18.6295159Z * [new branch] gh/nikitaved/4/base -> origin/gh/nikitaved/4/base 2025-12-04T09:17:18.6297025Z * [new branch] gh/nikitaved/4/head -> origin/gh/nikitaved/4/head 2025-12-04T09:17:18.6298846Z * [new branch] gh/nikitaved/4/orig -> origin/gh/nikitaved/4/orig 2025-12-04T09:17:18.6301547Z * [new branch] gh/nikitaved/5/base -> origin/gh/nikitaved/5/base 2025-12-04T09:17:18.6303466Z * [new branch] gh/nikitaved/5/head -> origin/gh/nikitaved/5/head 2025-12-04T09:17:18.6305457Z * [new branch] gh/nikitaved/5/orig -> origin/gh/nikitaved/5/orig 2025-12-04T09:17:18.6307906Z * [new branch] gh/nikitaved/6/base -> origin/gh/nikitaved/6/base 2025-12-04T09:17:18.6309950Z * [new branch] gh/nikitaved/6/head -> origin/gh/nikitaved/6/head 2025-12-04T09:17:18.6311853Z * [new branch] gh/nikitaved/6/orig -> origin/gh/nikitaved/6/orig 2025-12-04T09:17:18.6314432Z * [new branch] gh/nikitaved/8/base -> origin/gh/nikitaved/8/base 2025-12-04T09:17:18.6316224Z * [new branch] gh/nikitaved/8/head -> origin/gh/nikitaved/8/head 2025-12-04T09:17:18.6318186Z * [new branch] gh/nikitaved/8/orig -> origin/gh/nikitaved/8/orig 2025-12-04T09:17:18.6320584Z * [new branch] gh/nikitaved/9/base -> origin/gh/nikitaved/9/base 2025-12-04T09:17:18.6322374Z * [new branch] gh/nikitaved/9/head -> origin/gh/nikitaved/9/head 2025-12-04T09:17:18.6324212Z * [new branch] gh/nikitaved/9/orig -> origin/gh/nikitaved/9/orig 2025-12-04T09:17:18.6327258Z * [new branch] gh/oulgen/10/base -> origin/gh/oulgen/10/base 2025-12-04T09:17:18.6329082Z * [new branch] gh/oulgen/10/head -> origin/gh/oulgen/10/head 2025-12-04T09:17:18.6330915Z * [new branch] gh/oulgen/10/orig -> origin/gh/oulgen/10/orig 2025-12-04T09:17:18.6333328Z * [new branch] gh/oulgen/11/base -> origin/gh/oulgen/11/base 2025-12-04T09:17:18.6335816Z * [new branch] gh/oulgen/11/head -> origin/gh/oulgen/11/head 2025-12-04T09:17:18.6337622Z * [new branch] gh/oulgen/11/orig -> origin/gh/oulgen/11/orig 2025-12-04T09:17:18.6340177Z * [new branch] gh/oulgen/12/base -> origin/gh/oulgen/12/base 2025-12-04T09:17:18.6341992Z * [new branch] gh/oulgen/12/head -> origin/gh/oulgen/12/head 2025-12-04T09:17:18.6344046Z * [new branch] gh/oulgen/12/orig -> origin/gh/oulgen/12/orig 2025-12-04T09:17:18.6346341Z * [new branch] gh/oulgen/13/base -> origin/gh/oulgen/13/base 2025-12-04T09:17:18.6348152Z * [new branch] gh/oulgen/13/head -> origin/gh/oulgen/13/head 2025-12-04T09:17:18.6349968Z * [new branch] gh/oulgen/13/orig -> origin/gh/oulgen/13/orig 2025-12-04T09:17:18.6352550Z * [new branch] gh/oulgen/14/base -> origin/gh/oulgen/14/base 2025-12-04T09:17:18.6354395Z * [new branch] gh/oulgen/14/head -> origin/gh/oulgen/14/head 2025-12-04T09:17:18.6356597Z * [new branch] gh/oulgen/14/orig -> origin/gh/oulgen/14/orig 2025-12-04T09:17:18.6358817Z * [new branch] gh/oulgen/15/base -> origin/gh/oulgen/15/base 2025-12-04T09:17:18.6360637Z * [new branch] gh/oulgen/15/head -> origin/gh/oulgen/15/head 2025-12-04T09:17:18.6362392Z * [new branch] gh/oulgen/15/orig -> origin/gh/oulgen/15/orig 2025-12-04T09:17:18.6364789Z * [new branch] gh/oulgen/16/base -> origin/gh/oulgen/16/base 2025-12-04T09:17:18.6366636Z * [new branch] gh/oulgen/16/head -> origin/gh/oulgen/16/head 2025-12-04T09:17:18.6368427Z * [new branch] gh/oulgen/16/orig -> origin/gh/oulgen/16/orig 2025-12-04T09:17:18.6370860Z * [new branch] gh/oulgen/17/base -> origin/gh/oulgen/17/base 2025-12-04T09:17:18.6372650Z * [new branch] gh/oulgen/17/head -> origin/gh/oulgen/17/head 2025-12-04T09:17:18.6374842Z * [new branch] gh/oulgen/17/orig -> origin/gh/oulgen/17/orig 2025-12-04T09:17:18.6377103Z * [new branch] gh/oulgen/18/base -> origin/gh/oulgen/18/base 2025-12-04T09:17:18.6378928Z * [new branch] gh/oulgen/18/head -> origin/gh/oulgen/18/head 2025-12-04T09:17:18.6381037Z * [new branch] gh/oulgen/18/orig -> origin/gh/oulgen/18/orig 2025-12-04T09:17:18.6383327Z * [new branch] gh/oulgen/19/base -> origin/gh/oulgen/19/base 2025-12-04T09:17:18.6385141Z * [new branch] gh/oulgen/19/head -> origin/gh/oulgen/19/head 2025-12-04T09:17:18.6387128Z * [new branch] gh/oulgen/19/orig -> origin/gh/oulgen/19/orig 2025-12-04T09:17:18.6390055Z * [new branch] gh/oulgen/20/base -> origin/gh/oulgen/20/base 2025-12-04T09:17:18.6391918Z * [new branch] gh/oulgen/20/head -> origin/gh/oulgen/20/head 2025-12-04T09:17:18.6393923Z * [new branch] gh/oulgen/20/orig -> origin/gh/oulgen/20/orig 2025-12-04T09:17:18.6396205Z * [new branch] gh/oulgen/21/base -> origin/gh/oulgen/21/base 2025-12-04T09:17:18.6397992Z * [new branch] gh/oulgen/21/head -> origin/gh/oulgen/21/head 2025-12-04T09:17:18.6399816Z * [new branch] gh/oulgen/21/orig -> origin/gh/oulgen/21/orig 2025-12-04T09:17:18.6402366Z * [new branch] gh/oulgen/22/base -> origin/gh/oulgen/22/base 2025-12-04T09:17:18.6404763Z * [new branch] gh/oulgen/22/head -> origin/gh/oulgen/22/head 2025-12-04T09:17:18.6406433Z * [new branch] gh/oulgen/22/orig -> origin/gh/oulgen/22/orig 2025-12-04T09:17:18.6410033Z * [new branch] gh/oulgen/23/base -> origin/gh/oulgen/23/base 2025-12-04T09:17:18.6411925Z * [new branch] gh/oulgen/23/head -> origin/gh/oulgen/23/head 2025-12-04T09:17:18.6413699Z * [new branch] gh/oulgen/23/orig -> origin/gh/oulgen/23/orig 2025-12-04T09:17:18.6416209Z * [new branch] gh/oulgen/24/base -> origin/gh/oulgen/24/base 2025-12-04T09:17:18.6418127Z * [new branch] gh/oulgen/24/head -> origin/gh/oulgen/24/head 2025-12-04T09:17:18.6420041Z * [new branch] gh/oulgen/24/orig -> origin/gh/oulgen/24/orig 2025-12-04T09:17:18.6422585Z * [new branch] gh/oulgen/25/base -> origin/gh/oulgen/25/base 2025-12-04T09:17:18.6424375Z * [new branch] gh/oulgen/25/head -> origin/gh/oulgen/25/head 2025-12-04T09:17:18.6426466Z * [new branch] gh/oulgen/25/orig -> origin/gh/oulgen/25/orig 2025-12-04T09:17:18.6428993Z * [new branch] gh/oulgen/26/base -> origin/gh/oulgen/26/base 2025-12-04T09:17:18.6430589Z * [new branch] gh/oulgen/26/head -> origin/gh/oulgen/26/head 2025-12-04T09:17:18.6432417Z * [new branch] gh/oulgen/26/orig -> origin/gh/oulgen/26/orig 2025-12-04T09:17:18.6434870Z * [new branch] gh/oulgen/4/base -> origin/gh/oulgen/4/base 2025-12-04T09:17:18.6436695Z * [new branch] gh/oulgen/4/head -> origin/gh/oulgen/4/head 2025-12-04T09:17:18.6438468Z * [new branch] gh/oulgen/4/orig -> origin/gh/oulgen/4/orig 2025-12-04T09:17:18.6441554Z * [new branch] gh/oulgen/7/base -> origin/gh/oulgen/7/base 2025-12-04T09:17:18.6443337Z * [new branch] gh/oulgen/7/head -> origin/gh/oulgen/7/head 2025-12-04T09:17:18.6445129Z * [new branch] gh/oulgen/7/orig -> origin/gh/oulgen/7/orig 2025-12-04T09:17:18.6447797Z * [new branch] gh/oulgen/8/base -> origin/gh/oulgen/8/base 2025-12-04T09:17:18.6449641Z * [new branch] gh/oulgen/8/head -> origin/gh/oulgen/8/head 2025-12-04T09:17:18.6451608Z * [new branch] gh/oulgen/8/orig -> origin/gh/oulgen/8/orig 2025-12-04T09:17:18.6454125Z * [new branch] gh/oulgen/9/base -> origin/gh/oulgen/9/base 2025-12-04T09:17:18.6455941Z * [new branch] gh/oulgen/9/head -> origin/gh/oulgen/9/head 2025-12-04T09:17:18.6457980Z * [new branch] gh/oulgen/9/orig -> origin/gh/oulgen/9/orig 2025-12-04T09:17:18.6460549Z * [new branch] gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization 2025-12-04T09:17:18.6463934Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-12-04T09:17:18.6465940Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-12-04T09:17:18.6467778Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-12-04T09:17:18.6470248Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-12-04T09:17:18.6472038Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-12-04T09:17:18.6473905Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-12-04T09:17:18.6476825Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-12-04T09:17:18.6478373Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-12-04T09:17:18.6480330Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-12-04T09:17:18.6482819Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-12-04T09:17:18.6484506Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-12-04T09:17:18.6486470Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-12-04T09:17:18.6489049Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-12-04T09:17:18.6491141Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-12-04T09:17:18.6492683Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-12-04T09:17:18.6495187Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-12-04T09:17:18.6497038Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-12-04T09:17:18.6498860Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-12-04T09:17:18.6501665Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-12-04T09:17:18.6503409Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-12-04T09:17:18.6505320Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-12-04T09:17:18.6507863Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-12-04T09:17:18.6511960Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-12-04T09:17:18.6513923Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-12-04T09:17:18.6516478Z * [new branch] gh/pearu/118/base -> origin/gh/pearu/118/base 2025-12-04T09:17:18.6518256Z * [new branch] gh/pearu/118/head -> origin/gh/pearu/118/head 2025-12-04T09:17:18.6520084Z * [new branch] gh/pearu/118/orig -> origin/gh/pearu/118/orig 2025-12-04T09:17:18.6522603Z * [new branch] gh/pearu/119/base -> origin/gh/pearu/119/base 2025-12-04T09:17:18.6524787Z * [new branch] gh/pearu/119/head -> origin/gh/pearu/119/head 2025-12-04T09:17:18.6526643Z * [new branch] gh/pearu/119/orig -> origin/gh/pearu/119/orig 2025-12-04T09:17:18.6529246Z * [new branch] gh/pearu/139/base -> origin/gh/pearu/139/base 2025-12-04T09:17:18.6531040Z * [new branch] gh/pearu/139/head -> origin/gh/pearu/139/head 2025-12-04T09:17:18.6532864Z * [new branch] gh/pearu/139/orig -> origin/gh/pearu/139/orig 2025-12-04T09:17:18.6535374Z * [new branch] gh/pearu/140/base -> origin/gh/pearu/140/base 2025-12-04T09:17:18.6537368Z * [new branch] gh/pearu/140/head -> origin/gh/pearu/140/head 2025-12-04T09:17:18.6539166Z * [new branch] gh/pearu/140/orig -> origin/gh/pearu/140/orig 2025-12-04T09:17:18.6541744Z * [new branch] gh/pearu/142/base -> origin/gh/pearu/142/base 2025-12-04T09:17:18.6543585Z * [new branch] gh/pearu/142/head -> origin/gh/pearu/142/head 2025-12-04T09:17:18.6545429Z * [new branch] gh/pearu/142/orig -> origin/gh/pearu/142/orig 2025-12-04T09:17:18.6547924Z * [new branch] gh/pearu/143/base -> origin/gh/pearu/143/base 2025-12-04T09:17:18.6549722Z * [new branch] gh/pearu/143/head -> origin/gh/pearu/143/head 2025-12-04T09:17:18.6551614Z * [new branch] gh/pearu/143/orig -> origin/gh/pearu/143/orig 2025-12-04T09:17:18.6554237Z * [new branch] gh/pearu/147/base -> origin/gh/pearu/147/base 2025-12-04T09:17:18.6556073Z * [new branch] gh/pearu/147/head -> origin/gh/pearu/147/head 2025-12-04T09:17:18.6557923Z * [new branch] gh/pearu/147/orig -> origin/gh/pearu/147/orig 2025-12-04T09:17:18.6560441Z * [new branch] gh/pearu/149/base -> origin/gh/pearu/149/base 2025-12-04T09:17:18.6562245Z * [new branch] gh/pearu/149/head -> origin/gh/pearu/149/head 2025-12-04T09:17:18.6564259Z * [new branch] gh/pearu/149/orig -> origin/gh/pearu/149/orig 2025-12-04T09:17:18.6567267Z * [new branch] gh/pearu/150/base -> origin/gh/pearu/150/base 2025-12-04T09:17:18.6569151Z * [new branch] gh/pearu/150/head -> origin/gh/pearu/150/head 2025-12-04T09:17:18.6570913Z * [new branch] gh/pearu/150/orig -> origin/gh/pearu/150/orig 2025-12-04T09:17:18.6574268Z * [new branch] gh/pearu/151/base -> origin/gh/pearu/151/base 2025-12-04T09:17:18.6576677Z * [new branch] gh/pearu/151/head -> origin/gh/pearu/151/head 2025-12-04T09:17:18.6578405Z * [new branch] gh/pearu/151/orig -> origin/gh/pearu/151/orig 2025-12-04T09:17:18.6581302Z * [new branch] gh/pearu/152/base -> origin/gh/pearu/152/base 2025-12-04T09:17:18.6583138Z * [new branch] gh/pearu/152/head -> origin/gh/pearu/152/head 2025-12-04T09:17:18.6585090Z * [new branch] gh/pearu/152/orig -> origin/gh/pearu/152/orig 2025-12-04T09:17:18.6587579Z * [new branch] gh/pearu/153/base -> origin/gh/pearu/153/base 2025-12-04T09:17:18.6589385Z * [new branch] gh/pearu/153/head -> origin/gh/pearu/153/head 2025-12-04T09:17:18.6591185Z * [new branch] gh/pearu/153/orig -> origin/gh/pearu/153/orig 2025-12-04T09:17:18.6593719Z * [new branch] gh/pearu/154/base -> origin/gh/pearu/154/base 2025-12-04T09:17:18.6595534Z * [new branch] gh/pearu/154/head -> origin/gh/pearu/154/head 2025-12-04T09:17:18.6597344Z * [new branch] gh/pearu/154/orig -> origin/gh/pearu/154/orig 2025-12-04T09:17:18.6600011Z * [new branch] gh/pearu/155/base -> origin/gh/pearu/155/base 2025-12-04T09:17:18.6601861Z * [new branch] gh/pearu/155/head -> origin/gh/pearu/155/head 2025-12-04T09:17:18.6603626Z * [new branch] gh/pearu/155/orig -> origin/gh/pearu/155/orig 2025-12-04T09:17:18.6606276Z * [new branch] gh/pearu/156/base -> origin/gh/pearu/156/base 2025-12-04T09:17:18.6608214Z * [new branch] gh/pearu/156/head -> origin/gh/pearu/156/head 2025-12-04T09:17:18.6610165Z * [new branch] gh/pearu/156/orig -> origin/gh/pearu/156/orig 2025-12-04T09:17:18.6613057Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-12-04T09:17:18.6615758Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-12-04T09:17:18.6617447Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-12-04T09:17:18.6620540Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-12-04T09:17:18.6622531Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-12-04T09:17:18.6624281Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-12-04T09:17:18.6627295Z * [new branch] gh/pianpwk/21/base -> origin/gh/pianpwk/21/base 2025-12-04T09:17:18.6629097Z * [new branch] gh/pianpwk/21/head -> origin/gh/pianpwk/21/head 2025-12-04T09:17:18.6631762Z * [new branch] gh/pianpwk/28/base -> origin/gh/pianpwk/28/base 2025-12-04T09:17:18.6633568Z * [new branch] gh/pianpwk/28/head -> origin/gh/pianpwk/28/head 2025-12-04T09:17:18.6635450Z * [new branch] gh/pianpwk/28/orig -> origin/gh/pianpwk/28/orig 2025-12-04T09:17:18.6637946Z * [new branch] gh/pianpwk/29/base -> origin/gh/pianpwk/29/base 2025-12-04T09:17:18.6639822Z * [new branch] gh/pianpwk/29/head -> origin/gh/pianpwk/29/head 2025-12-04T09:17:18.6641669Z * [new branch] gh/pianpwk/29/orig -> origin/gh/pianpwk/29/orig 2025-12-04T09:17:18.6644376Z * [new branch] gh/pianpwk/30/base -> origin/gh/pianpwk/30/base 2025-12-04T09:17:18.6646232Z * [new branch] gh/pianpwk/30/head -> origin/gh/pianpwk/30/head 2025-12-04T09:17:18.6648087Z * [new branch] gh/pianpwk/30/orig -> origin/gh/pianpwk/30/orig 2025-12-04T09:17:18.6650634Z * [new branch] gh/pianpwk/31/base -> origin/gh/pianpwk/31/base 2025-12-04T09:17:18.6652470Z * [new branch] gh/pianpwk/31/head -> origin/gh/pianpwk/31/head 2025-12-04T09:17:18.6654279Z * [new branch] gh/pianpwk/31/orig -> origin/gh/pianpwk/31/orig 2025-12-04T09:17:18.6656731Z * [new branch] gh/pianpwk/32/base -> origin/gh/pianpwk/32/base 2025-12-04T09:17:18.6658560Z * [new branch] gh/pianpwk/32/head -> origin/gh/pianpwk/32/head 2025-12-04T09:17:18.6660540Z * [new branch] gh/pianpwk/32/orig -> origin/gh/pianpwk/32/orig 2025-12-04T09:17:18.6662872Z * [new branch] gh/pianpwk/33/base -> origin/gh/pianpwk/33/base 2025-12-04T09:17:18.6664685Z * [new branch] gh/pianpwk/33/head -> origin/gh/pianpwk/33/head 2025-12-04T09:17:18.6666464Z * [new branch] gh/pianpwk/33/orig -> origin/gh/pianpwk/33/orig 2025-12-04T09:17:18.6669271Z * [new branch] gh/pianpwk/34/base -> origin/gh/pianpwk/34/base 2025-12-04T09:17:18.6671377Z * [new branch] gh/pianpwk/34/head -> origin/gh/pianpwk/34/head 2025-12-04T09:17:18.6673461Z * [new branch] gh/pianpwk/34/orig -> origin/gh/pianpwk/34/orig 2025-12-04T09:17:18.6675939Z * [new branch] gh/pianpwk/35/base -> origin/gh/pianpwk/35/base 2025-12-04T09:17:18.6677926Z * [new branch] gh/pianpwk/35/head -> origin/gh/pianpwk/35/head 2025-12-04T09:17:18.6679744Z * [new branch] gh/pianpwk/35/orig -> origin/gh/pianpwk/35/orig 2025-12-04T09:17:18.6682818Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-12-04T09:17:18.6684694Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-12-04T09:17:18.6687191Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-12-04T09:17:18.6688979Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-12-04T09:17:18.6690733Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-12-04T09:17:18.6693804Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-12-04T09:17:18.6695536Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-12-04T09:17:18.6697329Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-12-04T09:17:18.6699976Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-12-04T09:17:18.6701805Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-12-04T09:17:18.6703689Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-12-04T09:17:18.6706203Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-12-04T09:17:18.6708340Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-12-04T09:17:18.6710078Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-12-04T09:17:18.6712610Z * [new branch] gh/rec/167/base -> origin/gh/rec/167/base 2025-12-04T09:17:18.6714340Z * [new branch] gh/rec/167/head -> origin/gh/rec/167/head 2025-12-04T09:17:18.6716230Z * [new branch] gh/rec/167/orig -> origin/gh/rec/167/orig 2025-12-04T09:17:18.6718700Z * [new branch] gh/rec/168/base -> origin/gh/rec/168/base 2025-12-04T09:17:18.6720540Z * [new branch] gh/rec/168/head -> origin/gh/rec/168/head 2025-12-04T09:17:18.6722271Z * [new branch] gh/rec/168/orig -> origin/gh/rec/168/orig 2025-12-04T09:17:18.6724844Z * [new branch] gh/rec/169/base -> origin/gh/rec/169/base 2025-12-04T09:17:18.6726780Z * [new branch] gh/rec/169/head -> origin/gh/rec/169/head 2025-12-04T09:17:18.6728554Z * [new branch] gh/rec/169/orig -> origin/gh/rec/169/orig 2025-12-04T09:17:18.6731179Z * [new branch] gh/rec/170/base -> origin/gh/rec/170/base 2025-12-04T09:17:18.6732972Z * [new branch] gh/rec/170/head -> origin/gh/rec/170/head 2025-12-04T09:17:18.6734827Z * [new branch] gh/rec/170/orig -> origin/gh/rec/170/orig 2025-12-04T09:17:18.6737345Z * [new branch] gh/rec/171/base -> origin/gh/rec/171/base 2025-12-04T09:17:18.6739231Z * [new branch] gh/rec/171/head -> origin/gh/rec/171/head 2025-12-04T09:17:18.6741193Z * [new branch] gh/rec/171/orig -> origin/gh/rec/171/orig 2025-12-04T09:17:18.6744164Z * [new branch] gh/rec/172/base -> origin/gh/rec/172/base 2025-12-04T09:17:18.6746055Z * [new branch] gh/rec/172/head -> origin/gh/rec/172/head 2025-12-04T09:17:18.6747788Z * [new branch] gh/rec/172/orig -> origin/gh/rec/172/orig 2025-12-04T09:17:18.6750307Z * [new branch] gh/rec/173/base -> origin/gh/rec/173/base 2025-12-04T09:17:18.6752097Z * [new branch] gh/rec/173/head -> origin/gh/rec/173/head 2025-12-04T09:17:18.6753936Z * [new branch] gh/rec/173/orig -> origin/gh/rec/173/orig 2025-12-04T09:17:18.6756522Z * [new branch] gh/rec/174/base -> origin/gh/rec/174/base 2025-12-04T09:17:18.6758341Z * [new branch] gh/rec/174/head -> origin/gh/rec/174/head 2025-12-04T09:17:18.6760190Z * [new branch] gh/rec/174/orig -> origin/gh/rec/174/orig 2025-12-04T09:17:18.6762664Z * [new branch] gh/rec/175/base -> origin/gh/rec/175/base 2025-12-04T09:17:18.6764516Z * [new branch] gh/rec/175/head -> origin/gh/rec/175/head 2025-12-04T09:17:18.6766320Z * [new branch] gh/rec/175/orig -> origin/gh/rec/175/orig 2025-12-04T09:17:18.6768989Z * [new branch] gh/rec/176/base -> origin/gh/rec/176/base 2025-12-04T09:17:18.6770585Z * [new branch] gh/rec/176/head -> origin/gh/rec/176/head 2025-12-04T09:17:18.6772375Z * [new branch] gh/rec/176/orig -> origin/gh/rec/176/orig 2025-12-04T09:17:18.6774903Z * [new branch] gh/rec/177/base -> origin/gh/rec/177/base 2025-12-04T09:17:18.6776738Z * [new branch] gh/rec/177/head -> origin/gh/rec/177/head 2025-12-04T09:17:18.6778552Z * [new branch] gh/rec/177/orig -> origin/gh/rec/177/orig 2025-12-04T09:17:18.6781933Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-12-04T09:17:18.6783779Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-12-04T09:17:18.6785670Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-12-04T09:17:18.6788137Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-12-04T09:17:18.6789987Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-12-04T09:17:18.6791809Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-12-04T09:17:18.6794266Z * [new branch] gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base 2025-12-04T09:17:18.6796108Z * [new branch] gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head 2025-12-04T09:17:18.6798030Z * [new branch] gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig 2025-12-04T09:17:18.6800538Z * [new branch] gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base 2025-12-04T09:17:18.6802918Z * [new branch] gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head 2025-12-04T09:17:18.6804760Z * [new branch] gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig 2025-12-04T09:17:18.6807272Z * [new branch] gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base 2025-12-04T09:17:18.6809230Z * [new branch] gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head 2025-12-04T09:17:18.6811131Z * [new branch] gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig 2025-12-04T09:17:18.6813528Z * [new branch] gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base 2025-12-04T09:17:18.6815593Z * [new branch] gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head 2025-12-04T09:17:18.6817416Z * [new branch] gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig 2025-12-04T09:17:18.6820103Z * [new branch] gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base 2025-12-04T09:17:18.6822040Z * [new branch] gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head 2025-12-04T09:17:18.6823733Z * [new branch] gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig 2025-12-04T09:17:18.6826825Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-12-04T09:17:18.6828811Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-12-04T09:17:18.6831274Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-12-04T09:17:18.6833048Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-12-04T09:17:18.6836044Z * [new branch] gh/rtimpe/22/base -> origin/gh/rtimpe/22/base 2025-12-04T09:17:18.6837824Z * [new branch] gh/rtimpe/22/head -> origin/gh/rtimpe/22/head 2025-12-04T09:17:18.6839592Z * [new branch] gh/rtimpe/22/orig -> origin/gh/rtimpe/22/orig 2025-12-04T09:17:18.6842000Z * [new branch] gh/rtimpe/23/base -> origin/gh/rtimpe/23/base 2025-12-04T09:17:18.6844040Z * [new branch] gh/rtimpe/23/head -> origin/gh/rtimpe/23/head 2025-12-04T09:17:18.6845688Z * [new branch] gh/rtimpe/23/orig -> origin/gh/rtimpe/23/orig 2025-12-04T09:17:18.6848109Z * [new branch] gh/rtimpe/24/base -> origin/gh/rtimpe/24/base 2025-12-04T09:17:18.6849921Z * [new branch] gh/rtimpe/24/head -> origin/gh/rtimpe/24/head 2025-12-04T09:17:18.6851972Z * [new branch] gh/rtimpe/24/orig -> origin/gh/rtimpe/24/orig 2025-12-04T09:17:18.6854521Z * [new branch] gh/rtimpe/25/base -> origin/gh/rtimpe/25/base 2025-12-04T09:17:18.6856423Z * [new branch] gh/rtimpe/25/head -> origin/gh/rtimpe/25/head 2025-12-04T09:17:18.6858255Z * [new branch] gh/rtimpe/25/orig -> origin/gh/rtimpe/25/orig 2025-12-04T09:17:18.6860908Z * [new branch] gh/rtimpe/26/base -> origin/gh/rtimpe/26/base 2025-12-04T09:17:18.6862698Z * [new branch] gh/rtimpe/26/head -> origin/gh/rtimpe/26/head 2025-12-04T09:17:18.6864475Z * [new branch] gh/rtimpe/26/orig -> origin/gh/rtimpe/26/orig 2025-12-04T09:17:18.6867495Z * [new branch] gh/rtimpe/27/base -> origin/gh/rtimpe/27/base 2025-12-04T09:17:18.6869316Z * [new branch] gh/rtimpe/27/head -> origin/gh/rtimpe/27/head 2025-12-04T09:17:18.6871123Z * [new branch] gh/rtimpe/27/orig -> origin/gh/rtimpe/27/orig 2025-12-04T09:17:18.6873610Z * [new branch] gh/rtimpe/28/base -> origin/gh/rtimpe/28/base 2025-12-04T09:17:18.6875419Z * [new branch] gh/rtimpe/28/head -> origin/gh/rtimpe/28/head 2025-12-04T09:17:18.6877279Z * [new branch] gh/rtimpe/28/orig -> origin/gh/rtimpe/28/orig 2025-12-04T09:17:18.6880432Z * [new branch] gh/rtimpe/29/base -> origin/gh/rtimpe/29/base 2025-12-04T09:17:18.6882264Z * [new branch] gh/rtimpe/29/head -> origin/gh/rtimpe/29/head 2025-12-04T09:17:18.6884047Z * [new branch] gh/rtimpe/29/orig -> origin/gh/rtimpe/29/orig 2025-12-04T09:17:18.6886546Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-12-04T09:17:18.6888289Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-12-04T09:17:18.6890793Z * [new branch] gh/rtimpe/30/base -> origin/gh/rtimpe/30/base 2025-12-04T09:17:18.6893050Z * [new branch] gh/rtimpe/30/head -> origin/gh/rtimpe/30/head 2025-12-04T09:17:18.6894867Z * [new branch] gh/rtimpe/30/orig -> origin/gh/rtimpe/30/orig 2025-12-04T09:17:18.6897331Z * [new branch] gh/rtimpe/31/base -> origin/gh/rtimpe/31/base 2025-12-04T09:17:18.6899221Z * [new branch] gh/rtimpe/31/head -> origin/gh/rtimpe/31/head 2025-12-04T09:17:18.6901205Z * [new branch] gh/rtimpe/31/orig -> origin/gh/rtimpe/31/orig 2025-12-04T09:17:18.6903746Z * [new branch] gh/rtimpe/32/base -> origin/gh/rtimpe/32/base 2025-12-04T09:17:18.6905590Z * [new branch] gh/rtimpe/32/head -> origin/gh/rtimpe/32/head 2025-12-04T09:17:18.6907342Z * [new branch] gh/rtimpe/32/orig -> origin/gh/rtimpe/32/orig 2025-12-04T09:17:18.6910170Z * [new branch] gh/rtimpe/33/base -> origin/gh/rtimpe/33/base 2025-12-04T09:17:18.6911993Z * [new branch] gh/rtimpe/33/head -> origin/gh/rtimpe/33/head 2025-12-04T09:17:18.6913846Z * [new branch] gh/rtimpe/33/orig -> origin/gh/rtimpe/33/orig 2025-12-04T09:17:18.6916232Z * [new branch] gh/rtimpe/34/base -> origin/gh/rtimpe/34/base 2025-12-04T09:17:18.6918095Z * [new branch] gh/rtimpe/34/head -> origin/gh/rtimpe/34/head 2025-12-04T09:17:18.6920115Z * [new branch] gh/rtimpe/34/orig -> origin/gh/rtimpe/34/orig 2025-12-04T09:17:18.6922441Z * [new branch] gh/rtimpe/35/base -> origin/gh/rtimpe/35/base 2025-12-04T09:17:18.6924307Z * [new branch] gh/rtimpe/35/head -> origin/gh/rtimpe/35/head 2025-12-04T09:17:18.6926142Z * [new branch] gh/rtimpe/35/orig -> origin/gh/rtimpe/35/orig 2025-12-04T09:17:18.6928732Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-12-04T09:17:18.6930479Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-12-04T09:17:18.6933586Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-12-04T09:17:18.6935531Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-12-04T09:17:18.6937371Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-12-04T09:17:18.6940042Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-12-04T09:17:18.6941936Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-12-04T09:17:18.6943729Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-12-04T09:17:18.6946333Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-12-04T09:17:18.6948132Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-12-04T09:17:18.6950046Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-12-04T09:17:18.6952601Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-12-04T09:17:18.6954447Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-12-04T09:17:18.6956232Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-12-04T09:17:18.6958820Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-12-04T09:17:18.6960642Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-12-04T09:17:18.6962438Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-12-04T09:17:18.6964876Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-12-04T09:17:18.6966663Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-12-04T09:17:18.6968478Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-12-04T09:17:18.6970953Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-12-04T09:17:18.6972753Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-12-04T09:17:18.6974591Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-12-04T09:17:18.6977853Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-12-04T09:17:18.6979734Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-12-04T09:17:18.6981580Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-12-04T09:17:18.6984044Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-12-04T09:17:18.6985872Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-12-04T09:17:18.6987692Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-12-04T09:17:18.6990192Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-12-04T09:17:18.6992028Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-12-04T09:17:18.6994037Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-12-04T09:17:18.6996849Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-12-04T09:17:18.6998588Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-12-04T09:17:18.7000456Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-12-04T09:17:18.7002958Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-12-04T09:17:18.7004816Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-12-04T09:17:18.7006683Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-12-04T09:17:18.7009310Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-12-04T09:17:18.7011215Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-12-04T09:17:18.7013069Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-12-04T09:17:18.7015603Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-12-04T09:17:18.7017427Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-12-04T09:17:18.7019401Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-12-04T09:17:18.7021954Z * [new branch] gh/seemethere/71/base -> origin/gh/seemethere/71/base 2025-12-04T09:17:18.7023726Z * [new branch] gh/seemethere/71/head -> origin/gh/seemethere/71/head 2025-12-04T09:17:18.7025524Z * [new branch] gh/seemethere/71/orig -> origin/gh/seemethere/71/orig 2025-12-04T09:17:18.7028263Z * [new branch] gh/seemethere/72/base -> origin/gh/seemethere/72/base 2025-12-04T09:17:18.7030055Z * [new branch] gh/seemethere/72/head -> origin/gh/seemethere/72/head 2025-12-04T09:17:18.7031858Z * [new branch] gh/seemethere/72/orig -> origin/gh/seemethere/72/orig 2025-12-04T09:17:18.7034380Z * [new branch] gh/seemethere/73/base -> origin/gh/seemethere/73/base 2025-12-04T09:17:18.7036204Z * [new branch] gh/seemethere/73/head -> origin/gh/seemethere/73/head 2025-12-04T09:17:18.7037973Z * [new branch] gh/seemethere/73/orig -> origin/gh/seemethere/73/orig 2025-12-04T09:17:18.7040526Z * [new branch] gh/seemethere/74/base -> origin/gh/seemethere/74/base 2025-12-04T09:17:18.7042339Z * [new branch] gh/seemethere/74/head -> origin/gh/seemethere/74/head 2025-12-04T09:17:18.7044185Z * [new branch] gh/seemethere/74/orig -> origin/gh/seemethere/74/orig 2025-12-04T09:17:18.7046613Z * [new branch] gh/seemethere/75/base -> origin/gh/seemethere/75/base 2025-12-04T09:17:18.7048444Z * [new branch] gh/seemethere/75/head -> origin/gh/seemethere/75/head 2025-12-04T09:17:18.7050283Z * [new branch] gh/seemethere/75/orig -> origin/gh/seemethere/75/orig 2025-12-04T09:17:18.7053046Z * [new branch] gh/seemethere/76/base -> origin/gh/seemethere/76/base 2025-12-04T09:17:18.7054806Z * [new branch] gh/seemethere/76/head -> origin/gh/seemethere/76/head 2025-12-04T09:17:18.7056678Z * [new branch] gh/seemethere/76/orig -> origin/gh/seemethere/76/orig 2025-12-04T09:17:18.7060029Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-12-04T09:17:18.7061947Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-12-04T09:17:18.7063827Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-12-04T09:17:18.7067200Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-12-04T09:17:18.7069109Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-12-04T09:17:18.7070969Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-12-04T09:17:18.7073583Z * [new branch] gh/shunting314/249/base -> origin/gh/shunting314/249/base 2025-12-04T09:17:18.7075434Z * [new branch] gh/shunting314/249/head -> origin/gh/shunting314/249/head 2025-12-04T09:17:18.7077438Z * [new branch] gh/shunting314/249/orig -> origin/gh/shunting314/249/orig 2025-12-04T09:17:18.7079990Z * [new branch] gh/shunting314/253/base -> origin/gh/shunting314/253/base 2025-12-04T09:17:18.7082279Z * [new branch] gh/shunting314/253/head -> origin/gh/shunting314/253/head 2025-12-04T09:17:18.7084155Z * [new branch] gh/shunting314/253/orig -> origin/gh/shunting314/253/orig 2025-12-04T09:17:18.7086649Z * [new branch] gh/shunting314/256/base -> origin/gh/shunting314/256/base 2025-12-04T09:17:18.7088459Z * [new branch] gh/shunting314/256/head -> origin/gh/shunting314/256/head 2025-12-04T09:17:18.7090292Z * [new branch] gh/shunting314/256/orig -> origin/gh/shunting314/256/orig 2025-12-04T09:17:18.7093103Z * [new branch] gh/shunting314/257/base -> origin/gh/shunting314/257/base 2025-12-04T09:17:18.7094939Z * [new branch] gh/shunting314/257/head -> origin/gh/shunting314/257/head 2025-12-04T09:17:18.7096734Z * [new branch] gh/shunting314/257/orig -> origin/gh/shunting314/257/orig 2025-12-04T09:17:18.7099490Z * [new branch] gh/shunting314/258/base -> origin/gh/shunting314/258/base 2025-12-04T09:17:18.7101285Z * [new branch] gh/shunting314/258/head -> origin/gh/shunting314/258/head 2025-12-04T09:17:18.7103310Z * [new branch] gh/shunting314/258/orig -> origin/gh/shunting314/258/orig 2025-12-04T09:17:18.7105616Z * [new branch] gh/shunting314/259/base -> origin/gh/shunting314/259/base 2025-12-04T09:17:18.7107419Z * [new branch] gh/shunting314/259/head -> origin/gh/shunting314/259/head 2025-12-04T09:17:18.7110793Z * [new branch] gh/shunting314/259/orig -> origin/gh/shunting314/259/orig 2025-12-04T09:17:18.7113592Z * [new branch] gh/shunting314/260/base -> origin/gh/shunting314/260/base 2025-12-04T09:17:18.7115480Z * [new branch] gh/shunting314/260/head -> origin/gh/shunting314/260/head 2025-12-04T09:17:18.7117600Z * [new branch] gh/shunting314/260/orig -> origin/gh/shunting314/260/orig 2025-12-04T09:17:18.7119932Z * [new branch] gh/shunting314/261/base -> origin/gh/shunting314/261/base 2025-12-04T09:17:18.7121919Z * [new branch] gh/shunting314/261/head -> origin/gh/shunting314/261/head 2025-12-04T09:17:18.7123677Z * [new branch] gh/shunting314/261/orig -> origin/gh/shunting314/261/orig 2025-12-04T09:17:18.7126213Z * [new branch] gh/shunting314/262/base -> origin/gh/shunting314/262/base 2025-12-04T09:17:18.7128129Z * [new branch] gh/shunting314/262/head -> origin/gh/shunting314/262/head 2025-12-04T09:17:18.7130220Z * [new branch] gh/shunting314/262/orig -> origin/gh/shunting314/262/orig 2025-12-04T09:17:18.7132734Z * [new branch] gh/shunting314/263/base -> origin/gh/shunting314/263/base 2025-12-04T09:17:18.7134714Z * [new branch] gh/shunting314/263/head -> origin/gh/shunting314/263/head 2025-12-04T09:17:18.7136602Z * [new branch] gh/shunting314/263/orig -> origin/gh/shunting314/263/orig 2025-12-04T09:17:18.7139284Z * [new branch] gh/shunting314/264/base -> origin/gh/shunting314/264/base 2025-12-04T09:17:18.7141310Z * [new branch] gh/shunting314/264/head -> origin/gh/shunting314/264/head 2025-12-04T09:17:18.7143117Z * [new branch] gh/shunting314/264/orig -> origin/gh/shunting314/264/orig 2025-12-04T09:17:18.7146130Z * [new branch] gh/shunting314/265/base -> origin/gh/shunting314/265/base 2025-12-04T09:17:18.7147633Z * [new branch] gh/shunting314/265/head -> origin/gh/shunting314/265/head 2025-12-04T09:17:18.7149630Z * [new branch] gh/shunting314/265/orig -> origin/gh/shunting314/265/orig 2025-12-04T09:17:18.7152422Z * [new branch] gh/shunting314/266/base -> origin/gh/shunting314/266/base 2025-12-04T09:17:18.7154607Z * [new branch] gh/shunting314/266/head -> origin/gh/shunting314/266/head 2025-12-04T09:17:18.7156503Z * [new branch] gh/shunting314/266/orig -> origin/gh/shunting314/266/orig 2025-12-04T09:17:18.7159357Z * [new branch] gh/shunting314/267/base -> origin/gh/shunting314/267/base 2025-12-04T09:17:18.7161407Z * [new branch] gh/shunting314/267/head -> origin/gh/shunting314/267/head 2025-12-04T09:17:18.7163138Z * [new branch] gh/shunting314/267/orig -> origin/gh/shunting314/267/orig 2025-12-04T09:17:18.7166212Z * [new branch] gh/shunting314/268/base -> origin/gh/shunting314/268/base 2025-12-04T09:17:18.7168229Z * [new branch] gh/shunting314/268/head -> origin/gh/shunting314/268/head 2025-12-04T09:17:18.7170059Z * [new branch] gh/shunting314/268/orig -> origin/gh/shunting314/268/orig 2025-12-04T09:17:18.7173156Z * [new branch] gh/shunting314/269/base -> origin/gh/shunting314/269/base 2025-12-04T09:17:18.7175055Z * [new branch] gh/shunting314/269/head -> origin/gh/shunting314/269/head 2025-12-04T09:17:18.7176852Z * [new branch] gh/shunting314/269/orig -> origin/gh/shunting314/269/orig 2025-12-04T09:17:18.7180123Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-12-04T09:17:18.7182490Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-12-04T09:17:18.7184746Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-12-04T09:17:18.7186536Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-12-04T09:17:18.7188917Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-12-04T09:17:18.7190672Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-12-04T09:17:18.7193089Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-12-04T09:17:18.7195511Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-12-04T09:17:18.7198462Z * [new branch] gh/slayton58/39/base -> origin/gh/slayton58/39/base 2025-12-04T09:17:18.7200241Z * [new branch] gh/slayton58/39/head -> origin/gh/slayton58/39/head 2025-12-04T09:17:18.7202290Z * [new branch] gh/slayton58/39/orig -> origin/gh/slayton58/39/orig 2025-12-04T09:17:18.7204757Z * [new branch] gh/slayton58/42/base -> origin/gh/slayton58/42/base 2025-12-04T09:17:18.7206662Z * [new branch] gh/slayton58/42/head -> origin/gh/slayton58/42/head 2025-12-04T09:17:18.7208644Z * [new branch] gh/slayton58/42/orig -> origin/gh/slayton58/42/orig 2025-12-04T09:17:18.7211663Z * [new branch] gh/slayton58/43/base -> origin/gh/slayton58/43/base 2025-12-04T09:17:18.7213143Z * [new branch] gh/slayton58/43/head -> origin/gh/slayton58/43/head 2025-12-04T09:17:18.7214927Z * [new branch] gh/slayton58/43/orig -> origin/gh/slayton58/43/orig 2025-12-04T09:17:18.7217545Z * [new branch] gh/slayton58/44/base -> origin/gh/slayton58/44/base 2025-12-04T09:17:18.7219789Z * [new branch] gh/slayton58/44/head -> origin/gh/slayton58/44/head 2025-12-04T09:17:18.7221854Z * [new branch] gh/slayton58/44/orig -> origin/gh/slayton58/44/orig 2025-12-04T09:17:18.7224080Z * [new branch] gh/slayton58/45/base -> origin/gh/slayton58/45/base 2025-12-04T09:17:18.7225804Z * [new branch] gh/slayton58/45/head -> origin/gh/slayton58/45/head 2025-12-04T09:17:18.7227693Z * [new branch] gh/slayton58/45/orig -> origin/gh/slayton58/45/orig 2025-12-04T09:17:18.7230276Z * [new branch] gh/slayton58/46/base -> origin/gh/slayton58/46/base 2025-12-04T09:17:18.7232171Z * [new branch] gh/slayton58/46/head -> origin/gh/slayton58/46/head 2025-12-04T09:17:18.7234071Z * [new branch] gh/slayton58/46/orig -> origin/gh/slayton58/46/orig 2025-12-04T09:17:18.7236535Z * [new branch] gh/slayton58/6/base -> origin/gh/slayton58/6/base 2025-12-04T09:17:18.7238419Z * [new branch] gh/slayton58/6/head -> origin/gh/slayton58/6/head 2025-12-04T09:17:18.7240763Z * [new branch] gh/slayton58/7/base -> origin/gh/slayton58/7/base 2025-12-04T09:17:18.7242614Z * [new branch] gh/slayton58/7/head -> origin/gh/slayton58/7/head 2025-12-04T09:17:18.7245784Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-12-04T09:17:18.7247569Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-12-04T09:17:18.7249748Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-12-04T09:17:18.7252163Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-12-04T09:17:18.7253970Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-12-04T09:17:18.7255758Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-12-04T09:17:18.7258720Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-12-04T09:17:18.7260514Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-12-04T09:17:18.7262354Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-12-04T09:17:18.7265183Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-12-04T09:17:18.7267084Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-12-04T09:17:18.7268945Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-12-04T09:17:18.7271501Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-12-04T09:17:18.7273583Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-12-04T09:17:18.7275461Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-12-04T09:17:18.7277857Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-12-04T09:17:18.7279871Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-12-04T09:17:18.7281753Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-12-04T09:17:18.7284355Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-12-04T09:17:18.7286627Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-12-04T09:17:18.7287996Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-12-04T09:17:18.7290573Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-12-04T09:17:18.7292370Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-12-04T09:17:18.7294268Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-12-04T09:17:18.7296875Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-12-04T09:17:18.7298741Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-12-04T09:17:18.7300698Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-12-04T09:17:18.7303207Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-12-04T09:17:18.7305050Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-12-04T09:17:18.7306816Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-12-04T09:17:18.7310196Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-12-04T09:17:18.7312302Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-12-04T09:17:18.7313946Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-12-04T09:17:18.7316607Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-12-04T09:17:18.7318527Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-12-04T09:17:18.7320275Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-12-04T09:17:18.7323435Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-12-04T09:17:18.7325618Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-12-04T09:17:18.7327521Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-12-04T09:17:18.7329600Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-12-04T09:17:18.7331456Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-12-04T09:17:18.7333222Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-12-04T09:17:18.7335747Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-12-04T09:17:18.7337564Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-12-04T09:17:18.7339620Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-12-04T09:17:18.7341993Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-12-04T09:17:18.7343966Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-12-04T09:17:18.7345802Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-12-04T09:17:18.7349072Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-12-04T09:17:18.7351336Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-12-04T09:17:18.7352877Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-12-04T09:17:18.7355862Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-12-04T09:17:18.7357970Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-12-04T09:17:18.7359751Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-12-04T09:17:18.7362216Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-12-04T09:17:18.7364197Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-12-04T09:17:18.7365962Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-12-04T09:17:18.7368471Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-12-04T09:17:18.7370728Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-12-04T09:17:18.7372130Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-12-04T09:17:18.7374784Z * [new branch] gh/soulitzer/380/base -> origin/gh/soulitzer/380/base 2025-12-04T09:17:18.7376645Z * [new branch] gh/soulitzer/380/head -> origin/gh/soulitzer/380/head 2025-12-04T09:17:18.7378254Z * [new branch] gh/soulitzer/380/orig -> origin/gh/soulitzer/380/orig 2025-12-04T09:17:18.7381063Z * [new branch] gh/soulitzer/385/base -> origin/gh/soulitzer/385/base 2025-12-04T09:17:18.7382868Z * [new branch] gh/soulitzer/385/head -> origin/gh/soulitzer/385/head 2025-12-04T09:17:18.7385171Z * [new branch] gh/soulitzer/385/orig -> origin/gh/soulitzer/385/orig 2025-12-04T09:17:18.7387983Z * [new branch] gh/soulitzer/386/base -> origin/gh/soulitzer/386/base 2025-12-04T09:17:18.7389776Z * [new branch] gh/soulitzer/386/head -> origin/gh/soulitzer/386/head 2025-12-04T09:17:18.7391628Z * [new branch] gh/soulitzer/386/orig -> origin/gh/soulitzer/386/orig 2025-12-04T09:17:18.7394168Z * [new branch] gh/soulitzer/387/base -> origin/gh/soulitzer/387/base 2025-12-04T09:17:18.7396015Z * [new branch] gh/soulitzer/387/head -> origin/gh/soulitzer/387/head 2025-12-04T09:17:18.7397825Z * [new branch] gh/soulitzer/387/orig -> origin/gh/soulitzer/387/orig 2025-12-04T09:17:18.7400301Z * [new branch] gh/soulitzer/388/base -> origin/gh/soulitzer/388/base 2025-12-04T09:17:18.7402225Z * [new branch] gh/soulitzer/388/head -> origin/gh/soulitzer/388/head 2025-12-04T09:17:18.7403925Z * [new branch] gh/soulitzer/388/orig -> origin/gh/soulitzer/388/orig 2025-12-04T09:17:18.7406517Z * [new branch] gh/soulitzer/389/base -> origin/gh/soulitzer/389/base 2025-12-04T09:17:18.7408778Z * [new branch] gh/soulitzer/389/head -> origin/gh/soulitzer/389/head 2025-12-04T09:17:18.7410563Z * [new branch] gh/soulitzer/389/orig -> origin/gh/soulitzer/389/orig 2025-12-04T09:17:18.7412963Z * [new branch] gh/soulitzer/390/base -> origin/gh/soulitzer/390/base 2025-12-04T09:17:18.7414755Z * [new branch] gh/soulitzer/390/head -> origin/gh/soulitzer/390/head 2025-12-04T09:17:18.7416649Z * [new branch] gh/soulitzer/390/orig -> origin/gh/soulitzer/390/orig 2025-12-04T09:17:18.7419246Z * [new branch] gh/soulitzer/391/base -> origin/gh/soulitzer/391/base 2025-12-04T09:17:18.7421223Z * [new branch] gh/soulitzer/391/head -> origin/gh/soulitzer/391/head 2025-12-04T09:17:18.7423057Z * [new branch] gh/soulitzer/391/orig -> origin/gh/soulitzer/391/orig 2025-12-04T09:17:18.7425551Z * [new branch] gh/soulitzer/392/base -> origin/gh/soulitzer/392/base 2025-12-04T09:17:18.7427469Z * [new branch] gh/soulitzer/392/head -> origin/gh/soulitzer/392/head 2025-12-04T09:17:18.7429200Z * [new branch] gh/soulitzer/392/orig -> origin/gh/soulitzer/392/orig 2025-12-04T09:17:18.7432319Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-12-04T09:17:18.7435305Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-12-04T09:17:18.7437069Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-12-04T09:17:18.7438851Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-12-04T09:17:18.7441320Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-12-04T09:17:18.7443394Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-12-04T09:17:18.7445061Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-12-04T09:17:18.7447478Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-12-04T09:17:18.7449273Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-12-04T09:17:18.7451537Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-12-04T09:17:18.7454478Z * [new branch] gh/swolchok/839/base -> origin/gh/swolchok/839/base 2025-12-04T09:17:18.7455984Z * [new branch] gh/swolchok/839/head -> origin/gh/swolchok/839/head 2025-12-04T09:17:18.7457817Z * [new branch] gh/swolchok/839/orig -> origin/gh/swolchok/839/orig 2025-12-04T09:17:18.7460575Z * [new branch] gh/swolchok/841/base -> origin/gh/swolchok/841/base 2025-12-04T09:17:18.7462391Z * [new branch] gh/swolchok/841/head -> origin/gh/swolchok/841/head 2025-12-04T09:17:18.7464308Z * [new branch] gh/swolchok/841/orig -> origin/gh/swolchok/841/orig 2025-12-04T09:17:18.7466788Z * [new branch] gh/swolchok/842/base -> origin/gh/swolchok/842/base 2025-12-04T09:17:18.7468563Z * [new branch] gh/swolchok/842/head -> origin/gh/swolchok/842/head 2025-12-04T09:17:18.7470353Z * [new branch] gh/swolchok/842/orig -> origin/gh/swolchok/842/orig 2025-12-04T09:17:18.7472809Z * [new branch] gh/swolchok/845/base -> origin/gh/swolchok/845/base 2025-12-04T09:17:18.7474609Z * [new branch] gh/swolchok/845/head -> origin/gh/swolchok/845/head 2025-12-04T09:17:18.7476618Z * [new branch] gh/swolchok/845/orig -> origin/gh/swolchok/845/orig 2025-12-04T09:17:18.7479079Z * [new branch] gh/swolchok/848/base -> origin/gh/swolchok/848/base 2025-12-04T09:17:18.7480983Z * [new branch] gh/swolchok/848/head -> origin/gh/swolchok/848/head 2025-12-04T09:17:18.7482857Z * [new branch] gh/swolchok/848/orig -> origin/gh/swolchok/848/orig 2025-12-04T09:17:18.7485439Z * [new branch] gh/swolchok/856/base -> origin/gh/swolchok/856/base 2025-12-04T09:17:18.7487371Z * [new branch] gh/swolchok/856/head -> origin/gh/swolchok/856/head 2025-12-04T09:17:18.7489199Z * [new branch] gh/swolchok/856/orig -> origin/gh/swolchok/856/orig 2025-12-04T09:17:18.7491733Z * [new branch] gh/swolchok/860/base -> origin/gh/swolchok/860/base 2025-12-04T09:17:18.7493587Z * [new branch] gh/swolchok/860/head -> origin/gh/swolchok/860/head 2025-12-04T09:17:18.7495347Z * [new branch] gh/swolchok/860/orig -> origin/gh/swolchok/860/orig 2025-12-04T09:17:18.7498134Z * [new branch] gh/swolchok/861/base -> origin/gh/swolchok/861/base 2025-12-04T09:17:18.7500246Z * [new branch] gh/swolchok/861/head -> origin/gh/swolchok/861/head 2025-12-04T09:17:18.7502208Z * [new branch] gh/swolchok/861/orig -> origin/gh/swolchok/861/orig 2025-12-04T09:17:18.7504762Z * [new branch] gh/swolchok/862/base -> origin/gh/swolchok/862/base 2025-12-04T09:17:18.7506534Z * [new branch] gh/swolchok/862/head -> origin/gh/swolchok/862/head 2025-12-04T09:17:18.7508182Z * [new branch] gh/swolchok/862/orig -> origin/gh/swolchok/862/orig 2025-12-04T09:17:18.7511321Z * [new branch] gh/swolchok/863/base -> origin/gh/swolchok/863/base 2025-12-04T09:17:18.7513162Z * [new branch] gh/swolchok/863/head -> origin/gh/swolchok/863/head 2025-12-04T09:17:18.7515066Z * [new branch] gh/swolchok/863/orig -> origin/gh/swolchok/863/orig 2025-12-04T09:17:18.7517884Z * [new branch] gh/swolchok/864/base -> origin/gh/swolchok/864/base 2025-12-04T09:17:18.7519543Z * [new branch] gh/swolchok/864/head -> origin/gh/swolchok/864/head 2025-12-04T09:17:18.7521332Z * [new branch] gh/swolchok/864/orig -> origin/gh/swolchok/864/orig 2025-12-04T09:17:18.7524008Z * [new branch] gh/swolchok/865/base -> origin/gh/swolchok/865/base 2025-12-04T09:17:18.7526038Z * [new branch] gh/swolchok/865/head -> origin/gh/swolchok/865/head 2025-12-04T09:17:18.7527948Z * [new branch] gh/swolchok/865/orig -> origin/gh/swolchok/865/orig 2025-12-04T09:17:18.7531063Z * [new branch] gh/swolchok/866/base -> origin/gh/swolchok/866/base 2025-12-04T09:17:18.7532845Z * [new branch] gh/swolchok/866/head -> origin/gh/swolchok/866/head 2025-12-04T09:17:18.7534705Z * [new branch] gh/swolchok/866/orig -> origin/gh/swolchok/866/orig 2025-12-04T09:17:18.7537330Z * [new branch] gh/swolchok/867/base -> origin/gh/swolchok/867/base 2025-12-04T09:17:18.7539085Z * [new branch] gh/swolchok/867/head -> origin/gh/swolchok/867/head 2025-12-04T09:17:18.7541121Z * [new branch] gh/swolchok/867/orig -> origin/gh/swolchok/867/orig 2025-12-04T09:17:18.7543581Z * [new branch] gh/swolchok/868/base -> origin/gh/swolchok/868/base 2025-12-04T09:17:18.7545690Z * [new branch] gh/swolchok/868/head -> origin/gh/swolchok/868/head 2025-12-04T09:17:18.7547487Z * [new branch] gh/swolchok/868/orig -> origin/gh/swolchok/868/orig 2025-12-04T09:17:18.7549735Z * [new branch] gh/swolchok/869/base -> origin/gh/swolchok/869/base 2025-12-04T09:17:18.7551630Z * [new branch] gh/swolchok/869/head -> origin/gh/swolchok/869/head 2025-12-04T09:17:18.7553588Z * [new branch] gh/swolchok/869/orig -> origin/gh/swolchok/869/orig 2025-12-04T09:17:18.7556218Z * [new branch] gh/swolchok/870/base -> origin/gh/swolchok/870/base 2025-12-04T09:17:18.7558078Z * [new branch] gh/swolchok/870/head -> origin/gh/swolchok/870/head 2025-12-04T09:17:18.7559987Z * [new branch] gh/swolchok/870/orig -> origin/gh/swolchok/870/orig 2025-12-04T09:17:18.7562553Z * [new branch] gh/swolchok/871/base -> origin/gh/swolchok/871/base 2025-12-04T09:17:18.7564528Z * [new branch] gh/swolchok/871/head -> origin/gh/swolchok/871/head 2025-12-04T09:17:18.7566410Z * [new branch] gh/swolchok/871/orig -> origin/gh/swolchok/871/orig 2025-12-04T09:17:18.7569701Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-12-04T09:17:18.7571588Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-12-04T09:17:18.7573433Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-12-04T09:17:18.7576619Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-12-04T09:17:18.7578445Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-12-04T09:17:18.7580403Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-12-04T09:17:18.7583050Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-12-04T09:17:18.7584916Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-12-04T09:17:18.7587407Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-12-04T09:17:18.7589194Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-12-04T09:17:18.7591032Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-12-04T09:17:18.7594551Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-12-04T09:17:18.7596333Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-12-04T09:17:18.7598144Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-12-04T09:17:18.7600697Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-12-04T09:17:18.7602581Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-12-04T09:17:18.7604434Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-12-04T09:17:18.7607138Z * [new branch] gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base 2025-12-04T09:17:18.7615630Z * [new branch] gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head 2025-12-04T09:17:18.7616080Z * [new branch] gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig 2025-12-04T09:17:18.7616425Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-12-04T09:17:18.7616679Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-12-04T09:17:18.7617434Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-12-04T09:17:18.7620856Z * [new branch] gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base 2025-12-04T09:17:18.7622586Z * [new branch] gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head 2025-12-04T09:17:18.7624397Z * [new branch] gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig 2025-12-04T09:17:18.7626805Z * [new branch] gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base 2025-12-04T09:17:18.7629293Z * [new branch] gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head 2025-12-04T09:17:18.7631385Z * [new branch] gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig 2025-12-04T09:17:18.7634094Z * [new branch] gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base 2025-12-04T09:17:18.7635954Z * [new branch] gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head 2025-12-04T09:17:18.7637767Z * [new branch] gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig 2025-12-04T09:17:18.7640283Z * [new branch] gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base 2025-12-04T09:17:18.7642113Z * [new branch] gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head 2025-12-04T09:17:18.7643950Z * [new branch] gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig 2025-12-04T09:17:18.7646535Z * [new branch] gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base 2025-12-04T09:17:18.7648358Z * [new branch] gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head 2025-12-04T09:17:18.7650151Z * [new branch] gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig 2025-12-04T09:17:18.7652646Z * [new branch] gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base 2025-12-04T09:17:18.7654474Z * [new branch] gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head 2025-12-04T09:17:18.7656874Z * [new branch] gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig 2025-12-04T09:17:18.7659380Z * [new branch] gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base 2025-12-04T09:17:18.7661277Z * [new branch] gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head 2025-12-04T09:17:18.7663040Z * [new branch] gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig 2025-12-04T09:17:18.7665620Z * [new branch] gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base 2025-12-04T09:17:18.7667622Z * [new branch] gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head 2025-12-04T09:17:18.7669369Z * [new branch] gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig 2025-12-04T09:17:18.7671693Z * [new branch] gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base 2025-12-04T09:17:18.7673609Z * [new branch] gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head 2025-12-04T09:17:18.7675455Z * [new branch] gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig 2025-12-04T09:17:18.7677962Z * [new branch] gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base 2025-12-04T09:17:18.7679778Z * [new branch] gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head 2025-12-04T09:17:18.7682103Z * [new branch] gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig 2025-12-04T09:17:18.7684871Z * [new branch] gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base 2025-12-04T09:17:18.7686825Z * [new branch] gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head 2025-12-04T09:17:18.7688734Z * [new branch] gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig 2025-12-04T09:17:18.7691379Z * [new branch] gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base 2025-12-04T09:17:18.7693339Z * [new branch] gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head 2025-12-04T09:17:18.7695152Z * [new branch] gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig 2025-12-04T09:17:18.7697577Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-12-04T09:17:18.7699437Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-12-04T09:17:18.7701355Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-12-04T09:17:18.7703704Z * [new branch] gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base 2025-12-04T09:17:18.7705553Z * [new branch] gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head 2025-12-04T09:17:18.7707338Z * [new branch] gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig 2025-12-04T09:17:18.7710886Z * [new branch] gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base 2025-12-04T09:17:18.7712545Z * [new branch] gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head 2025-12-04T09:17:18.7714885Z * [new branch] gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig 2025-12-04T09:17:18.7717545Z * [new branch] gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base 2025-12-04T09:17:18.7719327Z * [new branch] gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head 2025-12-04T09:17:18.7721163Z * [new branch] gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig 2025-12-04T09:17:18.7723793Z * [new branch] gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base 2025-12-04T09:17:18.7725597Z * [new branch] gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head 2025-12-04T09:17:18.7727435Z * [new branch] gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig 2025-12-04T09:17:18.7730255Z * [new branch] gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base 2025-12-04T09:17:18.7732168Z * [new branch] gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head 2025-12-04T09:17:18.7733913Z * [new branch] gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig 2025-12-04T09:17:18.7736641Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-12-04T09:17:18.7738486Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-12-04T09:17:18.7740608Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-12-04T09:17:18.7743312Z * [new branch] gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base 2025-12-04T09:17:18.7745282Z * [new branch] gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head 2025-12-04T09:17:18.7747139Z * [new branch] gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig 2025-12-04T09:17:18.7749915Z * [new branch] gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base 2025-12-04T09:17:18.7751868Z * [new branch] gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head 2025-12-04T09:17:18.7753765Z * [new branch] gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig 2025-12-04T09:17:18.7756514Z * [new branch] gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base 2025-12-04T09:17:18.7758383Z * [new branch] gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head 2025-12-04T09:17:18.7760205Z * [new branch] gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig 2025-12-04T09:17:18.7762892Z * [new branch] gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base 2025-12-04T09:17:18.7764848Z * [new branch] gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head 2025-12-04T09:17:18.7766667Z * [new branch] gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig 2025-12-04T09:17:18.7769423Z * [new branch] gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base 2025-12-04T09:17:18.7771345Z * [new branch] gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head 2025-12-04T09:17:18.7773184Z * [new branch] gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig 2025-12-04T09:17:18.7775851Z * [new branch] gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base 2025-12-04T09:17:18.7777632Z * [new branch] gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head 2025-12-04T09:17:18.7779471Z * [new branch] gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig 2025-12-04T09:17:18.7782004Z * [new branch] gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base 2025-12-04T09:17:18.7784032Z * [new branch] gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head 2025-12-04T09:17:18.7786073Z * [new branch] gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig 2025-12-04T09:17:18.7788856Z * [new branch] gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base 2025-12-04T09:17:18.7790611Z * [new branch] gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head 2025-12-04T09:17:18.7792415Z * [new branch] gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig 2025-12-04T09:17:18.7795035Z * [new branch] gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base 2025-12-04T09:17:18.7797101Z * [new branch] gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head 2025-12-04T09:17:18.7798933Z * [new branch] gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig 2025-12-04T09:17:18.7801526Z * [new branch] gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base 2025-12-04T09:17:18.7803356Z * [new branch] gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head 2025-12-04T09:17:18.7805188Z * [new branch] gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig 2025-12-04T09:17:18.7807911Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-12-04T09:17:18.7809788Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-12-04T09:17:18.7811746Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-12-04T09:17:18.7814233Z * [new branch] gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base 2025-12-04T09:17:18.7815991Z * [new branch] gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head 2025-12-04T09:17:18.7817972Z * [new branch] gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig 2025-12-04T09:17:18.7820854Z * [new branch] gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base 2025-12-04T09:17:18.7822580Z * [new branch] gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head 2025-12-04T09:17:18.7824288Z * [new branch] gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig 2025-12-04T09:17:18.7827441Z * [new branch] gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base 2025-12-04T09:17:18.7829352Z * [new branch] gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head 2025-12-04T09:17:18.7831261Z * [new branch] gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig 2025-12-04T09:17:18.7833706Z * [new branch] gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base 2025-12-04T09:17:18.7835548Z * [new branch] gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head 2025-12-04T09:17:18.7837404Z * [new branch] gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig 2025-12-04T09:17:18.7840320Z * [new branch] gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base 2025-12-04T09:17:18.7842159Z * [new branch] gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head 2025-12-04T09:17:18.7844033Z * [new branch] gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig 2025-12-04T09:17:18.7847107Z * [new branch] gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base 2025-12-04T09:17:18.7849030Z * [new branch] gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head 2025-12-04T09:17:18.7850872Z * [new branch] gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig 2025-12-04T09:17:18.7853470Z * [new branch] gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base 2025-12-04T09:17:18.7855301Z * [new branch] gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head 2025-12-04T09:17:18.7857123Z * [new branch] gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig 2025-12-04T09:17:18.7860395Z * [new branch] gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base 2025-12-04T09:17:18.7862076Z * [new branch] gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head 2025-12-04T09:17:18.7863857Z * [new branch] gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig 2025-12-04T09:17:18.7866604Z * [new branch] gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base 2025-12-04T09:17:18.7868404Z * [new branch] gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head 2025-12-04T09:17:18.7870234Z * [new branch] gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig 2025-12-04T09:17:18.7872852Z * [new branch] gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base 2025-12-04T09:17:18.7874739Z * [new branch] gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head 2025-12-04T09:17:18.7876476Z * [new branch] gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig 2025-12-04T09:17:18.7879033Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-12-04T09:17:18.7880742Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-12-04T09:17:18.7882620Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-12-04T09:17:18.7885625Z * [new branch] gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base 2025-12-04T09:17:18.7887725Z * [new branch] gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head 2025-12-04T09:17:18.7889537Z * [new branch] gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig 2025-12-04T09:17:18.7892394Z * [new branch] gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base 2025-12-04T09:17:18.7894147Z * [new branch] gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head 2025-12-04T09:17:18.7896000Z * [new branch] gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig 2025-12-04T09:17:18.7898656Z * [new branch] gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base 2025-12-04T09:17:18.7900659Z * [new branch] gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head 2025-12-04T09:17:18.7902489Z * [new branch] gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig 2025-12-04T09:17:18.7905313Z * [new branch] gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base 2025-12-04T09:17:18.7907196Z * [new branch] gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head 2025-12-04T09:17:18.7909325Z * [new branch] gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig 2025-12-04T09:17:18.7912358Z * [new branch] gh/v0i0/14/base -> origin/gh/v0i0/14/base 2025-12-04T09:17:18.7914098Z * [new branch] gh/v0i0/14/head -> origin/gh/v0i0/14/head 2025-12-04T09:17:18.7915896Z * [new branch] gh/v0i0/14/orig -> origin/gh/v0i0/14/orig 2025-12-04T09:17:18.7918308Z * [new branch] gh/v0i0/15/base -> origin/gh/v0i0/15/base 2025-12-04T09:17:18.7920214Z * [new branch] gh/v0i0/15/head -> origin/gh/v0i0/15/head 2025-12-04T09:17:18.7922090Z * [new branch] gh/v0i0/15/orig -> origin/gh/v0i0/15/orig 2025-12-04T09:17:18.7924640Z * [new branch] gh/v0i0/16/base -> origin/gh/v0i0/16/base 2025-12-04T09:17:18.7926457Z * [new branch] gh/v0i0/16/head -> origin/gh/v0i0/16/head 2025-12-04T09:17:18.7928255Z * [new branch] gh/v0i0/16/orig -> origin/gh/v0i0/16/orig 2025-12-04T09:17:18.7930712Z * [new branch] gh/v0i0/17/base -> origin/gh/v0i0/17/base 2025-12-04T09:17:18.7932546Z * [new branch] gh/v0i0/17/head -> origin/gh/v0i0/17/head 2025-12-04T09:17:18.7934340Z * [new branch] gh/v0i0/17/orig -> origin/gh/v0i0/17/orig 2025-12-04T09:17:18.7936904Z * [new branch] gh/v0i0/18/base -> origin/gh/v0i0/18/base 2025-12-04T09:17:18.7938814Z * [new branch] gh/v0i0/18/head -> origin/gh/v0i0/18/head 2025-12-04T09:17:18.7941315Z * [new branch] gh/v0i0/18/orig -> origin/gh/v0i0/18/orig 2025-12-04T09:17:18.7943903Z * [new branch] gh/v0i0/19/base -> origin/gh/v0i0/19/base 2025-12-04T09:17:18.7945680Z * [new branch] gh/v0i0/19/head -> origin/gh/v0i0/19/head 2025-12-04T09:17:18.7947534Z * [new branch] gh/v0i0/19/orig -> origin/gh/v0i0/19/orig 2025-12-04T09:17:18.7950654Z * [new branch] gh/vishal9-team/1/base -> origin/gh/vishal9-team/1/base 2025-12-04T09:17:18.7952498Z * [new branch] gh/vishal9-team/1/head -> origin/gh/vishal9-team/1/head 2025-12-04T09:17:18.7954893Z * [new branch] gh/vishal9-team/2/base -> origin/gh/vishal9-team/2/base 2025-12-04T09:17:18.7956734Z * [new branch] gh/vishal9-team/2/head -> origin/gh/vishal9-team/2/head 2025-12-04T09:17:18.7958613Z * [new branch] gh/vishal9-team/2/orig -> origin/gh/vishal9-team/2/orig 2025-12-04T09:17:18.7961278Z * [new branch] gh/vishal9-team/3/base -> origin/gh/vishal9-team/3/base 2025-12-04T09:17:18.7963010Z * [new branch] gh/vishal9-team/3/head -> origin/gh/vishal9-team/3/head 2025-12-04T09:17:18.7964892Z * [new branch] gh/vishal9-team/3/orig -> origin/gh/vishal9-team/3/orig 2025-12-04T09:17:18.7967883Z * [new branch] gh/vishal9-team/4/base -> origin/gh/vishal9-team/4/base 2025-12-04T09:17:18.7969727Z * [new branch] gh/vishal9-team/4/head -> origin/gh/vishal9-team/4/head 2025-12-04T09:17:18.7971525Z * [new branch] gh/vishal9-team/4/orig -> origin/gh/vishal9-team/4/orig 2025-12-04T09:17:18.7974510Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-12-04T09:17:18.7977041Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-12-04T09:17:18.7979583Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-12-04T09:17:18.7982715Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-12-04T09:17:18.7984548Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-12-04T09:17:18.7986513Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-12-04T09:17:18.7989116Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-12-04T09:17:18.7991011Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-12-04T09:17:18.7992846Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-12-04T09:17:18.7995362Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-12-04T09:17:18.7997629Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-12-04T09:17:18.7999519Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-12-04T09:17:18.8002062Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-12-04T09:17:18.8003845Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-12-04T09:17:18.8005698Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-12-04T09:17:18.8008388Z * [new branch] gh/wconstab/448/base -> origin/gh/wconstab/448/base 2025-12-04T09:17:18.8010230Z * [new branch] gh/wconstab/448/head -> origin/gh/wconstab/448/head 2025-12-04T09:17:18.8012010Z * [new branch] gh/wconstab/448/orig -> origin/gh/wconstab/448/orig 2025-12-04T09:17:18.8014490Z * [new branch] gh/wconstab/449/base -> origin/gh/wconstab/449/base 2025-12-04T09:17:18.8016327Z * [new branch] gh/wconstab/449/head -> origin/gh/wconstab/449/head 2025-12-04T09:17:18.8019024Z * [new branch] gh/wconstab/449/orig -> origin/gh/wconstab/449/orig 2025-12-04T09:17:18.8021614Z * [new branch] gh/wconstab/450/base -> origin/gh/wconstab/450/base 2025-12-04T09:17:18.8024051Z * [new branch] gh/wconstab/450/head -> origin/gh/wconstab/450/head 2025-12-04T09:17:18.8025893Z * [new branch] gh/wconstab/450/orig -> origin/gh/wconstab/450/orig 2025-12-04T09:17:18.8028249Z * [new branch] gh/wconstab/451/base -> origin/gh/wconstab/451/base 2025-12-04T09:17:18.8030140Z * [new branch] gh/wconstab/451/head -> origin/gh/wconstab/451/head 2025-12-04T09:17:18.8031930Z * [new branch] gh/wconstab/451/orig -> origin/gh/wconstab/451/orig 2025-12-04T09:17:18.8034470Z * [new branch] gh/wconstab/452/base -> origin/gh/wconstab/452/base 2025-12-04T09:17:18.8036215Z * [new branch] gh/wconstab/452/head -> origin/gh/wconstab/452/head 2025-12-04T09:17:18.8038267Z * [new branch] gh/wconstab/452/orig -> origin/gh/wconstab/452/orig 2025-12-04T09:17:18.8040566Z * [new branch] gh/wconstab/453/base -> origin/gh/wconstab/453/base 2025-12-04T09:17:18.8042514Z * [new branch] gh/wconstab/453/head -> origin/gh/wconstab/453/head 2025-12-04T09:17:18.8044860Z * [new branch] gh/wconstab/453/orig -> origin/gh/wconstab/453/orig 2025-12-04T09:17:18.8047260Z * [new branch] gh/wconstab/454/base -> origin/gh/wconstab/454/base 2025-12-04T09:17:18.8049150Z * [new branch] gh/wconstab/454/head -> origin/gh/wconstab/454/head 2025-12-04T09:17:18.8050940Z * [new branch] gh/wconstab/454/orig -> origin/gh/wconstab/454/orig 2025-12-04T09:17:18.8053487Z * [new branch] gh/wconstab/455/base -> origin/gh/wconstab/455/base 2025-12-04T09:17:18.8055335Z * [new branch] gh/wconstab/455/head -> origin/gh/wconstab/455/head 2025-12-04T09:17:18.8057204Z * [new branch] gh/wconstab/455/orig -> origin/gh/wconstab/455/orig 2025-12-04T09:17:18.8060064Z * [new branch] gh/wconstab/456/base -> origin/gh/wconstab/456/base 2025-12-04T09:17:18.8062140Z * [new branch] gh/wconstab/456/head -> origin/gh/wconstab/456/head 2025-12-04T09:17:18.8064031Z * [new branch] gh/wconstab/456/orig -> origin/gh/wconstab/456/orig 2025-12-04T09:17:18.8066633Z * [new branch] gh/wconstab/457/base -> origin/gh/wconstab/457/base 2025-12-04T09:17:18.8068686Z * [new branch] gh/wconstab/457/head -> origin/gh/wconstab/457/head 2025-12-04T09:17:18.8070719Z * [new branch] gh/wconstab/457/orig -> origin/gh/wconstab/457/orig 2025-12-04T09:17:18.8073267Z * [new branch] gh/wconstab/458/base -> origin/gh/wconstab/458/base 2025-12-04T09:17:18.8075106Z * [new branch] gh/wconstab/458/head -> origin/gh/wconstab/458/head 2025-12-04T09:17:18.8076926Z * [new branch] gh/wconstab/458/orig -> origin/gh/wconstab/458/orig 2025-12-04T09:17:18.8079388Z * [new branch] gh/wconstab/459/base -> origin/gh/wconstab/459/base 2025-12-04T09:17:18.8081337Z * [new branch] gh/wconstab/459/head -> origin/gh/wconstab/459/head 2025-12-04T09:17:18.8083087Z * [new branch] gh/wconstab/459/orig -> origin/gh/wconstab/459/orig 2025-12-04T09:17:18.8086382Z * [new branch] gh/wconstab/460/base -> origin/gh/wconstab/460/base 2025-12-04T09:17:18.8088452Z * [new branch] gh/wconstab/460/head -> origin/gh/wconstab/460/head 2025-12-04T09:17:18.8090362Z * [new branch] gh/wconstab/460/orig -> origin/gh/wconstab/460/orig 2025-12-04T09:17:18.8093105Z * [new branch] gh/wconstab/461/base -> origin/gh/wconstab/461/base 2025-12-04T09:17:18.8094994Z * [new branch] gh/wconstab/461/head -> origin/gh/wconstab/461/head 2025-12-04T09:17:18.8096847Z * [new branch] gh/wconstab/461/orig -> origin/gh/wconstab/461/orig 2025-12-04T09:17:18.8099316Z * [new branch] gh/wconstab/462/base -> origin/gh/wconstab/462/base 2025-12-04T09:17:18.8101273Z * [new branch] gh/wconstab/462/head -> origin/gh/wconstab/462/head 2025-12-04T09:17:18.8103170Z * [new branch] gh/wconstab/462/orig -> origin/gh/wconstab/462/orig 2025-12-04T09:17:18.8105882Z * [new branch] gh/wconstab/463/base -> origin/gh/wconstab/463/base 2025-12-04T09:17:18.8107929Z * [new branch] gh/wconstab/463/head -> origin/gh/wconstab/463/head 2025-12-04T09:17:18.8109937Z * [new branch] gh/wconstab/463/orig -> origin/gh/wconstab/463/orig 2025-12-04T09:17:18.8112435Z * [new branch] gh/wconstab/464/base -> origin/gh/wconstab/464/base 2025-12-04T09:17:18.8114513Z * [new branch] gh/wconstab/464/head -> origin/gh/wconstab/464/head 2025-12-04T09:17:18.8116222Z * [new branch] gh/wconstab/464/orig -> origin/gh/wconstab/464/orig 2025-12-04T09:17:18.8118818Z * [new branch] gh/wconstab/465/base -> origin/gh/wconstab/465/base 2025-12-04T09:17:18.8120736Z * [new branch] gh/wconstab/465/head -> origin/gh/wconstab/465/head 2025-12-04T09:17:18.8122488Z * [new branch] gh/wconstab/465/orig -> origin/gh/wconstab/465/orig 2025-12-04T09:17:18.8125152Z * [new branch] gh/wconstab/466/base -> origin/gh/wconstab/466/base 2025-12-04T09:17:18.8126846Z * [new branch] gh/wconstab/466/head -> origin/gh/wconstab/466/head 2025-12-04T09:17:18.8129048Z * [new branch] gh/wconstab/466/orig -> origin/gh/wconstab/466/orig 2025-12-04T09:17:18.8132106Z * [new branch] gh/wconstab/467/base -> origin/gh/wconstab/467/base 2025-12-04T09:17:18.8134022Z * [new branch] gh/wconstab/467/head -> origin/gh/wconstab/467/head 2025-12-04T09:17:18.8135852Z * [new branch] gh/wconstab/467/orig -> origin/gh/wconstab/467/orig 2025-12-04T09:17:18.8138300Z * [new branch] gh/wconstab/468/base -> origin/gh/wconstab/468/base 2025-12-04T09:17:18.8140558Z * [new branch] gh/wconstab/468/head -> origin/gh/wconstab/468/head 2025-12-04T09:17:18.8142172Z * [new branch] gh/wconstab/468/orig -> origin/gh/wconstab/468/orig 2025-12-04T09:17:18.8145334Z * [new branch] gh/weifengpy/39/base -> origin/gh/weifengpy/39/base 2025-12-04T09:17:18.8147090Z * [new branch] gh/weifengpy/39/head -> origin/gh/weifengpy/39/head 2025-12-04T09:17:18.8149057Z * [new branch] gh/weifengpy/39/orig -> origin/gh/weifengpy/39/orig 2025-12-04T09:17:18.8151631Z * [new branch] gh/weifengpy/40/base -> origin/gh/weifengpy/40/base 2025-12-04T09:17:18.8153456Z * [new branch] gh/weifengpy/40/head -> origin/gh/weifengpy/40/head 2025-12-04T09:17:18.8155828Z * [new branch] gh/weifengpy/40/orig -> origin/gh/weifengpy/40/orig 2025-12-04T09:17:18.8158410Z * [new branch] gh/weifengpy/41/base -> origin/gh/weifengpy/41/base 2025-12-04T09:17:18.8160314Z * [new branch] gh/weifengpy/41/head -> origin/gh/weifengpy/41/head 2025-12-04T09:17:18.8162339Z * [new branch] gh/weifengpy/41/orig -> origin/gh/weifengpy/41/orig 2025-12-04T09:17:18.8165472Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-12-04T09:17:18.8167471Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-12-04T09:17:18.8169292Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-12-04T09:17:18.8171883Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-12-04T09:17:18.8174064Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-12-04T09:17:18.8175879Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-12-04T09:17:18.8179126Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-12-04T09:17:18.8181079Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-12-04T09:17:18.8182903Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-12-04T09:17:18.8185429Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-12-04T09:17:18.8187304Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-12-04T09:17:18.8189172Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-12-04T09:17:18.8191826Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-12-04T09:17:18.8193654Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-12-04T09:17:18.8195497Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-12-04T09:17:18.8198218Z * [new branch] gh/williamwen42/296/base -> origin/gh/williamwen42/296/base 2025-12-04T09:17:18.8200258Z * [new branch] gh/williamwen42/296/head -> origin/gh/williamwen42/296/head 2025-12-04T09:17:18.8202132Z * [new branch] gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig 2025-12-04T09:17:18.8204571Z * [new branch] gh/williamwen42/297/base -> origin/gh/williamwen42/297/base 2025-12-04T09:17:18.8206527Z * [new branch] gh/williamwen42/297/head -> origin/gh/williamwen42/297/head 2025-12-04T09:17:18.8208681Z * [new branch] gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig 2025-12-04T09:17:18.8214502Z * [new branch] gh/williamwen42/306/base -> origin/gh/williamwen42/306/base 2025-12-04T09:17:18.8216863Z * [new branch] gh/williamwen42/306/head -> origin/gh/williamwen42/306/head 2025-12-04T09:17:18.8218695Z * [new branch] gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig 2025-12-04T09:17:18.8221454Z * [new branch] gh/williamwen42/309/base -> origin/gh/williamwen42/309/base 2025-12-04T09:17:18.8223433Z * [new branch] gh/williamwen42/309/head -> origin/gh/williamwen42/309/head 2025-12-04T09:17:18.8225317Z * [new branch] gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig 2025-12-04T09:17:18.8227747Z * [new branch] gh/williamwen42/310/base -> origin/gh/williamwen42/310/base 2025-12-04T09:17:18.8229573Z * [new branch] gh/williamwen42/310/head -> origin/gh/williamwen42/310/head 2025-12-04T09:17:18.8231440Z * [new branch] gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig 2025-12-04T09:17:18.8235086Z * [new branch] gh/williamwen42/311/base -> origin/gh/williamwen42/311/base 2025-12-04T09:17:18.8236903Z * [new branch] gh/williamwen42/311/head -> origin/gh/williamwen42/311/head 2025-12-04T09:17:18.8238720Z * [new branch] gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig 2025-12-04T09:17:18.8241096Z * [new branch] gh/williamwen42/319/base -> origin/gh/williamwen42/319/base 2025-12-04T09:17:18.8243095Z * [new branch] gh/williamwen42/319/head -> origin/gh/williamwen42/319/head 2025-12-04T09:17:18.8245925Z * [new branch] gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig 2025-12-04T09:17:18.8248317Z * [new branch] gh/williamwen42/325/base -> origin/gh/williamwen42/325/base 2025-12-04T09:17:18.8249868Z * [new branch] gh/williamwen42/325/head -> origin/gh/williamwen42/325/head 2025-12-04T09:17:18.8251710Z * [new branch] gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig 2025-12-04T09:17:18.8254226Z * [new branch] gh/williamwen42/326/base -> origin/gh/williamwen42/326/base 2025-12-04T09:17:18.8256231Z * [new branch] gh/williamwen42/326/head -> origin/gh/williamwen42/326/head 2025-12-04T09:17:18.8258031Z * [new branch] gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig 2025-12-04T09:17:18.8260811Z * [new branch] gh/williamwen42/327/base -> origin/gh/williamwen42/327/base 2025-12-04T09:17:18.8262639Z * [new branch] gh/williamwen42/327/head -> origin/gh/williamwen42/327/head 2025-12-04T09:17:18.8264422Z * [new branch] gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig 2025-12-04T09:17:18.8267028Z * [new branch] gh/williamwen42/328/base -> origin/gh/williamwen42/328/base 2025-12-04T09:17:18.8269068Z * [new branch] gh/williamwen42/328/head -> origin/gh/williamwen42/328/head 2025-12-04T09:17:18.8270763Z * [new branch] gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig 2025-12-04T09:17:18.8273885Z * [new branch] gh/williamwen42/329/base -> origin/gh/williamwen42/329/base 2025-12-04T09:17:18.8276019Z * [new branch] gh/williamwen42/329/head -> origin/gh/williamwen42/329/head 2025-12-04T09:17:18.8277973Z * [new branch] gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig 2025-12-04T09:17:18.8280531Z * [new branch] gh/williamwen42/330/base -> origin/gh/williamwen42/330/base 2025-12-04T09:17:18.8282371Z * [new branch] gh/williamwen42/330/head -> origin/gh/williamwen42/330/head 2025-12-04T09:17:18.8284190Z * [new branch] gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig 2025-12-04T09:17:18.8286815Z * [new branch] gh/williamwen42/331/base -> origin/gh/williamwen42/331/base 2025-12-04T09:17:18.8288652Z * [new branch] gh/williamwen42/331/head -> origin/gh/williamwen42/331/head 2025-12-04T09:17:18.8290496Z * [new branch] gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig 2025-12-04T09:17:18.8293010Z * [new branch] gh/williamwen42/332/base -> origin/gh/williamwen42/332/base 2025-12-04T09:17:18.8294969Z * [new branch] gh/williamwen42/332/head -> origin/gh/williamwen42/332/head 2025-12-04T09:17:18.8296793Z * [new branch] gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig 2025-12-04T09:17:18.8299569Z * [new branch] gh/williamwen42/333/base -> origin/gh/williamwen42/333/base 2025-12-04T09:17:18.8301553Z * [new branch] gh/williamwen42/333/head -> origin/gh/williamwen42/333/head 2025-12-04T09:17:18.8303422Z * [new branch] gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig 2025-12-04T09:17:18.8306037Z * [new branch] gh/williamwen42/334/base -> origin/gh/williamwen42/334/base 2025-12-04T09:17:18.8308003Z * [new branch] gh/williamwen42/334/head -> origin/gh/williamwen42/334/head 2025-12-04T09:17:18.8310054Z * [new branch] gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig 2025-12-04T09:17:18.8315971Z * [new branch] gh/williamwen42/335/base -> origin/gh/williamwen42/335/base 2025-12-04T09:17:18.8318308Z * [new branch] gh/williamwen42/335/head -> origin/gh/williamwen42/335/head 2025-12-04T09:17:18.8320382Z * [new branch] gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig 2025-12-04T09:17:18.8323139Z * [new branch] gh/williamwen42/336/base -> origin/gh/williamwen42/336/base 2025-12-04T09:17:18.8324930Z * [new branch] gh/williamwen42/336/head -> origin/gh/williamwen42/336/head 2025-12-04T09:17:18.8326679Z * [new branch] gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig 2025-12-04T09:17:18.8329255Z * [new branch] gh/williamwen42/337/base -> origin/gh/williamwen42/337/base 2025-12-04T09:17:18.8331216Z * [new branch] gh/williamwen42/337/head -> origin/gh/williamwen42/337/head 2025-12-04T09:17:18.8333030Z * [new branch] gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig 2025-12-04T09:17:18.8335702Z * [new branch] gh/williamwen42/338/base -> origin/gh/williamwen42/338/base 2025-12-04T09:17:18.8337547Z * [new branch] gh/williamwen42/338/head -> origin/gh/williamwen42/338/head 2025-12-04T09:17:18.8339429Z * [new branch] gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig 2025-12-04T09:17:18.8342105Z * [new branch] gh/williamwen42/339/base -> origin/gh/williamwen42/339/base 2025-12-04T09:17:18.8344045Z * [new branch] gh/williamwen42/339/head -> origin/gh/williamwen42/339/head 2025-12-04T09:17:18.8345715Z * [new branch] gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig 2025-12-04T09:17:18.8348389Z * [new branch] gh/williamwen42/340/base -> origin/gh/williamwen42/340/base 2025-12-04T09:17:18.8350148Z * [new branch] gh/williamwen42/340/head -> origin/gh/williamwen42/340/head 2025-12-04T09:17:18.8351977Z * [new branch] gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig 2025-12-04T09:17:18.8354626Z * [new branch] gh/williamwen42/341/base -> origin/gh/williamwen42/341/base 2025-12-04T09:17:18.8356598Z * [new branch] gh/williamwen42/341/head -> origin/gh/williamwen42/341/head 2025-12-04T09:17:18.8358413Z * [new branch] gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig 2025-12-04T09:17:18.8360998Z * [new branch] gh/williamwen42/342/base -> origin/gh/williamwen42/342/base 2025-12-04T09:17:18.8362826Z * [new branch] gh/williamwen42/342/head -> origin/gh/williamwen42/342/head 2025-12-04T09:17:18.8364650Z * [new branch] gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig 2025-12-04T09:17:18.8367286Z * [new branch] gh/williamwen42/343/base -> origin/gh/williamwen42/343/base 2025-12-04T09:17:18.8369158Z * [new branch] gh/williamwen42/343/head -> origin/gh/williamwen42/343/head 2025-12-04T09:17:18.8370962Z * [new branch] gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig 2025-12-04T09:17:18.8373599Z * [new branch] gh/williamwen42/344/base -> origin/gh/williamwen42/344/base 2025-12-04T09:17:18.8375403Z * [new branch] gh/williamwen42/344/head -> origin/gh/williamwen42/344/head 2025-12-04T09:17:18.8377206Z * [new branch] gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig 2025-12-04T09:17:18.8379955Z * [new branch] gh/williamwen42/345/base -> origin/gh/williamwen42/345/base 2025-12-04T09:17:18.8381925Z * [new branch] gh/williamwen42/345/head -> origin/gh/williamwen42/345/head 2025-12-04T09:17:18.8383721Z * [new branch] gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig 2025-12-04T09:17:18.8386302Z * [new branch] gh/williamwen42/346/base -> origin/gh/williamwen42/346/base 2025-12-04T09:17:18.8388242Z * [new branch] gh/williamwen42/346/head -> origin/gh/williamwen42/346/head 2025-12-04T09:17:18.8390224Z * [new branch] gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig 2025-12-04T09:17:18.8392810Z * [new branch] gh/williamwen42/347/base -> origin/gh/williamwen42/347/base 2025-12-04T09:17:18.8394575Z * [new branch] gh/williamwen42/347/head -> origin/gh/williamwen42/347/head 2025-12-04T09:17:18.8396390Z * [new branch] gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig 2025-12-04T09:17:18.8398951Z * [new branch] gh/williamwen42/348/base -> origin/gh/williamwen42/348/base 2025-12-04T09:17:18.8400692Z * [new branch] gh/williamwen42/348/head -> origin/gh/williamwen42/348/head 2025-12-04T09:17:18.8402504Z * [new branch] gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig 2025-12-04T09:17:18.8404939Z * [new branch] gh/williamwen42/349/base -> origin/gh/williamwen42/349/base 2025-12-04T09:17:18.8406879Z * [new branch] gh/williamwen42/349/head -> origin/gh/williamwen42/349/head 2025-12-04T09:17:18.8408849Z * [new branch] gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig 2025-12-04T09:17:18.8411516Z * [new branch] gh/williamwen42/350/base -> origin/gh/williamwen42/350/base 2025-12-04T09:17:18.8413333Z * [new branch] gh/williamwen42/350/head -> origin/gh/williamwen42/350/head 2025-12-04T09:17:18.8415318Z * [new branch] gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig 2025-12-04T09:17:18.8417865Z * [new branch] gh/williamwen42/351/base -> origin/gh/williamwen42/351/base 2025-12-04T09:17:18.8420003Z * [new branch] gh/williamwen42/351/head -> origin/gh/williamwen42/351/head 2025-12-04T09:17:18.8422147Z * [new branch] gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig 2025-12-04T09:17:18.8425058Z * [new branch] gh/williamwen42/352/base -> origin/gh/williamwen42/352/base 2025-12-04T09:17:18.8426793Z * [new branch] gh/williamwen42/352/head -> origin/gh/williamwen42/352/head 2025-12-04T09:17:18.8428667Z * [new branch] gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig 2025-12-04T09:17:18.8431362Z * [new branch] gh/williamwen42/353/base -> origin/gh/williamwen42/353/base 2025-12-04T09:17:18.8433329Z * [new branch] gh/williamwen42/353/head -> origin/gh/williamwen42/353/head 2025-12-04T09:17:18.8435308Z * [new branch] gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig 2025-12-04T09:17:18.8437678Z * [new branch] gh/williamwen42/354/base -> origin/gh/williamwen42/354/base 2025-12-04T09:17:18.8439605Z * [new branch] gh/williamwen42/354/head -> origin/gh/williamwen42/354/head 2025-12-04T09:17:18.8441410Z * [new branch] gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig 2025-12-04T09:17:18.8443996Z * [new branch] gh/williamwen42/355/base -> origin/gh/williamwen42/355/base 2025-12-04T09:17:18.8445798Z * [new branch] gh/williamwen42/355/head -> origin/gh/williamwen42/355/head 2025-12-04T09:17:18.8447641Z * [new branch] gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig 2025-12-04T09:17:18.8450222Z * [new branch] gh/williamwen42/356/base -> origin/gh/williamwen42/356/base 2025-12-04T09:17:18.8452130Z * [new branch] gh/williamwen42/356/head -> origin/gh/williamwen42/356/head 2025-12-04T09:17:18.8453974Z * [new branch] gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig 2025-12-04T09:17:18.8456574Z * [new branch] gh/williamwen42/357/base -> origin/gh/williamwen42/357/base 2025-12-04T09:17:18.8458526Z * [new branch] gh/williamwen42/357/head -> origin/gh/williamwen42/357/head 2025-12-04T09:17:18.8460521Z * [new branch] gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig 2025-12-04T09:17:18.8463083Z * [new branch] gh/williamwen42/358/base -> origin/gh/williamwen42/358/base 2025-12-04T09:17:18.8464875Z * [new branch] gh/williamwen42/358/head -> origin/gh/williamwen42/358/head 2025-12-04T09:17:18.8466782Z * [new branch] gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig 2025-12-04T09:17:18.8469781Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-12-04T09:17:18.8471633Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-12-04T09:17:18.8474063Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-12-04T09:17:18.8475861Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-12-04T09:17:18.8478345Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-12-04T09:17:18.8480126Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-12-04T09:17:18.8482067Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-12-04T09:17:18.8484492Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-12-04T09:17:18.8486290Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-12-04T09:17:18.8488131Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-12-04T09:17:18.8490779Z * [new branch] gh/xmfan/301/base -> origin/gh/xmfan/301/base 2025-12-04T09:17:18.8492453Z * [new branch] gh/xmfan/301/head -> origin/gh/xmfan/301/head 2025-12-04T09:17:18.8494183Z * [new branch] gh/xmfan/301/orig -> origin/gh/xmfan/301/orig 2025-12-04T09:17:18.8496675Z * [new branch] gh/xmfan/304/base -> origin/gh/xmfan/304/base 2025-12-04T09:17:18.8499061Z * [new branch] gh/xmfan/304/head -> origin/gh/xmfan/304/head 2025-12-04T09:17:18.8500920Z * [new branch] gh/xmfan/304/orig -> origin/gh/xmfan/304/orig 2025-12-04T09:17:18.8503381Z * [new branch] gh/xmfan/309/base -> origin/gh/xmfan/309/base 2025-12-04T09:17:18.8505172Z * [new branch] gh/xmfan/309/head -> origin/gh/xmfan/309/head 2025-12-04T09:17:18.8507121Z * [new branch] gh/xmfan/309/orig -> origin/gh/xmfan/309/orig 2025-12-04T09:17:18.8510367Z * [new branch] gh/xmfan/310/base -> origin/gh/xmfan/310/base 2025-12-04T09:17:18.8512008Z * [new branch] gh/xmfan/310/head -> origin/gh/xmfan/310/head 2025-12-04T09:17:18.8513814Z * [new branch] gh/xmfan/310/orig -> origin/gh/xmfan/310/orig 2025-12-04T09:17:18.8516329Z * [new branch] gh/xmfan/311/base -> origin/gh/xmfan/311/base 2025-12-04T09:17:18.8518135Z * [new branch] gh/xmfan/311/head -> origin/gh/xmfan/311/head 2025-12-04T09:17:18.8519949Z * [new branch] gh/xmfan/311/orig -> origin/gh/xmfan/311/orig 2025-12-04T09:17:18.8522960Z * [new branch] gh/xmfan/312/base -> origin/gh/xmfan/312/base 2025-12-04T09:17:18.8524777Z * [new branch] gh/xmfan/312/head -> origin/gh/xmfan/312/head 2025-12-04T09:17:18.8526594Z * [new branch] gh/xmfan/312/orig -> origin/gh/xmfan/312/orig 2025-12-04T09:17:18.8529139Z * [new branch] gh/xmfan/313/base -> origin/gh/xmfan/313/base 2025-12-04T09:17:18.8530942Z * [new branch] gh/xmfan/313/head -> origin/gh/xmfan/313/head 2025-12-04T09:17:18.8532897Z * [new branch] gh/xmfan/313/orig -> origin/gh/xmfan/313/orig 2025-12-04T09:17:18.8535935Z * [new branch] gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base 2025-12-04T09:17:18.8537743Z * [new branch] gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head 2025-12-04T09:17:18.8539731Z * [new branch] gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig 2025-12-04T09:17:18.8542428Z * [new branch] gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base 2025-12-04T09:17:18.8544078Z * [new branch] gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head 2025-12-04T09:17:18.8545879Z * [new branch] gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig 2025-12-04T09:17:18.8548431Z * [new branch] gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base 2025-12-04T09:17:18.8550268Z * [new branch] gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head 2025-12-04T09:17:18.8552099Z * [new branch] gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig 2025-12-04T09:17:18.8554905Z * [new branch] gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base 2025-12-04T09:17:18.8556834Z * [new branch] gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head 2025-12-04T09:17:18.8558662Z * [new branch] gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig 2025-12-04T09:17:18.8561377Z * [new branch] gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base 2025-12-04T09:17:18.8563197Z * [new branch] gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head 2025-12-04T09:17:18.8565184Z * [new branch] gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig 2025-12-04T09:17:18.8568105Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-12-04T09:17:18.8569906Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-12-04T09:17:18.8571988Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-12-04T09:17:18.8574498Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-12-04T09:17:18.8576350Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-12-04T09:17:18.8578157Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-12-04T09:17:18.8580934Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-12-04T09:17:18.8582751Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-12-04T09:17:18.8584555Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-12-04T09:17:18.8586978Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-12-04T09:17:18.8588825Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-12-04T09:17:18.8590720Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-12-04T09:17:18.8593067Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-12-04T09:17:18.8594917Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-12-04T09:17:18.8596707Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-12-04T09:17:18.8599164Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-12-04T09:17:18.8600941Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-12-04T09:17:18.8602765Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-12-04T09:17:18.8605331Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-12-04T09:17:18.8607193Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-12-04T09:17:18.8612133Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-12-04T09:17:18.8614681Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-12-04T09:17:18.8616506Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-12-04T09:17:18.8618359Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-12-04T09:17:18.8621058Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-12-04T09:17:18.8622865Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-12-04T09:17:18.8625359Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-12-04T09:17:18.8627113Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-12-04T09:17:18.8628977Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-12-04T09:17:18.8631349Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-12-04T09:17:18.8633394Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-12-04T09:17:18.8635195Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-12-04T09:17:18.8637746Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-12-04T09:17:18.8639709Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-12-04T09:17:18.8641698Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-12-04T09:17:18.8644485Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-12-04T09:17:18.8646412Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-12-04T09:17:18.8648323Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-12-04T09:17:18.8650669Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-12-04T09:17:18.8652515Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-12-04T09:17:18.8654406Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-12-04T09:17:18.8658072Z * [new branch] gh/yang-yu-hang/1/base -> origin/gh/yang-yu-hang/1/base 2025-12-04T09:17:18.8660470Z * [new branch] gh/yang-yu-hang/1/head -> origin/gh/yang-yu-hang/1/head 2025-12-04T09:17:18.8662269Z * [new branch] gh/yang-yu-hang/1/orig -> origin/gh/yang-yu-hang/1/orig 2025-12-04T09:17:18.8664678Z * [new branch] gh/yang-yu-hang/2/base -> origin/gh/yang-yu-hang/2/base 2025-12-04T09:17:18.8666855Z * [new branch] gh/yang-yu-hang/2/head -> origin/gh/yang-yu-hang/2/head 2025-12-04T09:17:18.8668734Z * [new branch] gh/yang-yu-hang/2/orig -> origin/gh/yang-yu-hang/2/orig 2025-12-04T09:17:18.8671329Z * [new branch] gh/yang-yu-hang/3/base -> origin/gh/yang-yu-hang/3/base 2025-12-04T09:17:18.8673220Z * [new branch] gh/yang-yu-hang/3/head -> origin/gh/yang-yu-hang/3/head 2025-12-04T09:17:18.8675077Z * [new branch] gh/yang-yu-hang/3/orig -> origin/gh/yang-yu-hang/3/orig 2025-12-04T09:17:18.8678141Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-12-04T09:17:18.8679990Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-12-04T09:17:18.8681826Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-12-04T09:17:18.8684386Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-12-04T09:17:18.8686292Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-12-04T09:17:18.8688078Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-12-04T09:17:18.8690506Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-12-04T09:17:18.8692383Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-12-04T09:17:18.8694586Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-12-04T09:17:18.8697105Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-12-04T09:17:18.8698925Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-12-04T09:17:18.8701029Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-12-04T09:17:18.8703511Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-12-04T09:17:18.8705177Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-12-04T09:17:18.8707062Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-12-04T09:17:18.8709939Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-12-04T09:17:18.8712128Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-12-04T09:17:18.8713913Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-12-04T09:17:18.8716351Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-12-04T09:17:18.8718483Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-12-04T09:17:18.8720860Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-12-04T09:17:18.8723879Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-12-04T09:17:18.8725544Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-12-04T09:17:18.8727845Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-12-04T09:17:18.8730449Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-12-04T09:17:18.8732371Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-12-04T09:17:18.8734671Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-12-04T09:17:18.8737358Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-12-04T09:17:18.8739195Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-12-04T09:17:18.8741227Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-12-04T09:17:18.8743716Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-12-04T09:17:18.8745636Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-12-04T09:17:18.8747364Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-12-04T09:17:18.8750032Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-12-04T09:17:18.8751958Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-12-04T09:17:18.8754124Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-12-04T09:17:18.8756483Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-12-04T09:17:18.8758447Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-12-04T09:17:18.8760082Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-12-04T09:17:18.8762844Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-12-04T09:17:18.8764709Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-12-04T09:17:18.8766034Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-12-04T09:17:18.8768855Z * [new branch] gh/ydwu4/327/base -> origin/gh/ydwu4/327/base 2025-12-04T09:17:18.8770851Z * [new branch] gh/ydwu4/327/head -> origin/gh/ydwu4/327/head 2025-12-04T09:17:18.8772193Z * [new branch] gh/ydwu4/327/orig -> origin/gh/ydwu4/327/orig 2025-12-04T09:17:18.8775127Z * [new branch] gh/ydwu4/328/base -> origin/gh/ydwu4/328/base 2025-12-04T09:17:18.8777015Z * [new branch] gh/ydwu4/328/head -> origin/gh/ydwu4/328/head 2025-12-04T09:17:18.8779074Z * [new branch] gh/ydwu4/328/orig -> origin/gh/ydwu4/328/orig 2025-12-04T09:17:18.8781476Z * [new branch] gh/ydwu4/329/base -> origin/gh/ydwu4/329/base 2025-12-04T09:17:18.8783230Z * [new branch] gh/ydwu4/329/head -> origin/gh/ydwu4/329/head 2025-12-04T09:17:18.8785266Z * [new branch] gh/ydwu4/329/orig -> origin/gh/ydwu4/329/orig 2025-12-04T09:17:18.8787782Z * [new branch] gh/ydwu4/330/base -> origin/gh/ydwu4/330/base 2025-12-04T09:17:18.8789380Z * [new branch] gh/ydwu4/330/head -> origin/gh/ydwu4/330/head 2025-12-04T09:17:18.8791469Z * [new branch] gh/ydwu4/330/orig -> origin/gh/ydwu4/330/orig 2025-12-04T09:17:18.8794221Z * [new branch] gh/ydwu4/331/base -> origin/gh/ydwu4/331/base 2025-12-04T09:17:18.8796354Z * [new branch] gh/ydwu4/331/head -> origin/gh/ydwu4/331/head 2025-12-04T09:17:18.8797519Z * [new branch] gh/ydwu4/331/orig -> origin/gh/ydwu4/331/orig 2025-12-04T09:17:18.8800224Z * [new branch] gh/ydwu4/332/base -> origin/gh/ydwu4/332/base 2025-12-04T09:17:18.8801952Z * [new branch] gh/ydwu4/332/head -> origin/gh/ydwu4/332/head 2025-12-04T09:17:18.8804068Z * [new branch] gh/ydwu4/332/orig -> origin/gh/ydwu4/332/orig 2025-12-04T09:17:18.8806435Z * [new branch] gh/ydwu4/333/base -> origin/gh/ydwu4/333/base 2025-12-04T09:17:18.8808228Z * [new branch] gh/ydwu4/333/head -> origin/gh/ydwu4/333/head 2025-12-04T09:17:18.8813777Z * [new branch] gh/ydwu4/333/orig -> origin/gh/ydwu4/333/orig 2025-12-04T09:17:18.8816092Z * [new branch] gh/ydwu4/334/base -> origin/gh/ydwu4/334/base 2025-12-04T09:17:18.8817931Z * [new branch] gh/ydwu4/334/head -> origin/gh/ydwu4/334/head 2025-12-04T09:17:18.8819894Z * [new branch] gh/ydwu4/334/orig -> origin/gh/ydwu4/334/orig 2025-12-04T09:17:18.8822476Z * [new branch] gh/ydwu4/335/base -> origin/gh/ydwu4/335/base 2025-12-04T09:17:18.8824577Z * [new branch] gh/ydwu4/335/head -> origin/gh/ydwu4/335/head 2025-12-04T09:17:18.8825856Z * [new branch] gh/ydwu4/335/orig -> origin/gh/ydwu4/335/orig 2025-12-04T09:17:18.8829213Z * [new branch] gh/ydwu4/337/base -> origin/gh/ydwu4/337/base 2025-12-04T09:17:18.8831374Z * [new branch] gh/ydwu4/337/head -> origin/gh/ydwu4/337/head 2025-12-04T09:17:18.8832626Z * [new branch] gh/ydwu4/337/orig -> origin/gh/ydwu4/337/orig 2025-12-04T09:17:18.8835565Z * [new branch] gh/ydwu4/339/base -> origin/gh/ydwu4/339/base 2025-12-04T09:17:18.8837462Z * [new branch] gh/ydwu4/339/head -> origin/gh/ydwu4/339/head 2025-12-04T09:17:18.8839312Z * [new branch] gh/ydwu4/339/orig -> origin/gh/ydwu4/339/orig 2025-12-04T09:17:18.8842402Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-12-04T09:17:18.8844998Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-12-04T09:17:18.8847229Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-12-04T09:17:18.8849398Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-12-04T09:17:18.8852736Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-12-04T09:17:18.8855220Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-12-04T09:17:18.8856446Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-12-04T09:17:18.8859348Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-12-04T09:17:18.8861420Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-12-04T09:17:18.8863332Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-12-04T09:17:18.8866445Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-12-04T09:17:18.8868588Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-12-04T09:17:18.8870962Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-12-04T09:17:18.8872062Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-12-04T09:17:18.8876202Z * [new branch] gh/yushangdi/1/base -> origin/gh/yushangdi/1/base 2025-12-04T09:17:18.8878369Z * [new branch] gh/yushangdi/1/head -> origin/gh/yushangdi/1/head 2025-12-04T09:17:18.8880604Z * [new branch] gh/yushangdi/10/base -> origin/gh/yushangdi/10/base 2025-12-04T09:17:18.8882582Z * [new branch] gh/yushangdi/10/head -> origin/gh/yushangdi/10/head 2025-12-04T09:17:18.8883927Z * [new branch] gh/yushangdi/10/orig -> origin/gh/yushangdi/10/orig 2025-12-04T09:17:18.8886816Z * [new branch] gh/yushangdi/11/base -> origin/gh/yushangdi/11/base 2025-12-04T09:17:18.8888982Z * [new branch] gh/yushangdi/11/head -> origin/gh/yushangdi/11/head 2025-12-04T09:17:18.8890180Z * [new branch] gh/yushangdi/11/orig -> origin/gh/yushangdi/11/orig 2025-12-04T09:17:18.8900648Z * [new branch] gh/yushangdi/2/base -> origin/gh/yushangdi/2/base 2025-12-04T09:17:18.8901366Z * [new branch] gh/yushangdi/2/head -> origin/gh/yushangdi/2/head 2025-12-04T09:17:18.8901940Z * [new branch] gh/yushangdi/7/base -> origin/gh/yushangdi/7/base 2025-12-04T09:17:18.8902504Z * [new branch] gh/yushangdi/7/head -> origin/gh/yushangdi/7/head 2025-12-04T09:17:18.8903064Z * [new branch] gh/yushangdi/7/orig -> origin/gh/yushangdi/7/orig 2025-12-04T09:17:18.8903948Z * [new branch] gh/yushangdi/8/base -> origin/gh/yushangdi/8/base 2025-12-04T09:17:18.8906458Z * [new branch] gh/yushangdi/8/head -> origin/gh/yushangdi/8/head 2025-12-04T09:17:18.8907948Z * [new branch] gh/yushangdi/8/orig -> origin/gh/yushangdi/8/orig 2025-12-04T09:17:18.8910920Z * [new branch] gh/yushangdi/9/base -> origin/gh/yushangdi/9/base 2025-12-04T09:17:18.8912906Z * [new branch] gh/yushangdi/9/head -> origin/gh/yushangdi/9/head 2025-12-04T09:17:18.8914252Z * [new branch] gh/yushangdi/9/orig -> origin/gh/yushangdi/9/orig 2025-12-04T09:17:18.8917782Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-12-04T09:17:18.8919902Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-12-04T09:17:18.8921071Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-12-04T09:17:18.8924009Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-12-04T09:17:18.8926347Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-12-04T09:17:18.8927914Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-12-04T09:17:18.8930688Z * [new branch] gh/zklaus/21/base -> origin/gh/zklaus/21/base 2025-12-04T09:17:18.8932453Z * [new branch] gh/zklaus/21/head -> origin/gh/zklaus/21/head 2025-12-04T09:17:18.8934271Z * [new branch] gh/zklaus/21/orig -> origin/gh/zklaus/21/orig 2025-12-04T09:17:18.8937092Z * [new branch] gh/zklaus/22/base -> origin/gh/zklaus/22/base 2025-12-04T09:17:18.8938430Z * [new branch] gh/zklaus/22/head -> origin/gh/zklaus/22/head 2025-12-04T09:17:18.8940867Z * [new branch] gh/zklaus/22/orig -> origin/gh/zklaus/22/orig 2025-12-04T09:17:18.8943165Z * [new branch] gh/zklaus/23/base -> origin/gh/zklaus/23/base 2025-12-04T09:17:18.8944522Z * [new branch] gh/zklaus/23/head -> origin/gh/zklaus/23/head 2025-12-04T09:17:18.8946941Z * [new branch] gh/zklaus/23/orig -> origin/gh/zklaus/23/orig 2025-12-04T09:17:18.8949304Z * [new branch] gh/zklaus/24/base -> origin/gh/zklaus/24/base 2025-12-04T09:17:18.8951312Z * [new branch] gh/zklaus/24/head -> origin/gh/zklaus/24/head 2025-12-04T09:17:18.8952923Z * [new branch] gh/zklaus/24/orig -> origin/gh/zklaus/24/orig 2025-12-04T09:17:18.8956601Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-12-04T09:17:18.8957688Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-12-04T09:17:18.8960041Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-12-04T09:17:18.8962825Z * [new branch] gh/zou3519/1199/base -> origin/gh/zou3519/1199/base 2025-12-04T09:17:18.8964744Z * [new branch] gh/zou3519/1199/head -> origin/gh/zou3519/1199/head 2025-12-04T09:17:18.8966945Z * [new branch] gh/zou3519/1199/orig -> origin/gh/zou3519/1199/orig 2025-12-04T09:17:18.8969172Z * [new branch] gh/zou3519/1200/base -> origin/gh/zou3519/1200/base 2025-12-04T09:17:18.8971159Z * [new branch] gh/zou3519/1200/head -> origin/gh/zou3519/1200/head 2025-12-04T09:17:18.8972485Z * [new branch] gh/zou3519/1200/orig -> origin/gh/zou3519/1200/orig 2025-12-04T09:17:18.8975921Z * [new branch] gh/zou3519/1201/base -> origin/gh/zou3519/1201/base 2025-12-04T09:17:18.8976763Z * [new branch] gh/zou3519/1201/head -> origin/gh/zou3519/1201/head 2025-12-04T09:17:18.8978860Z * [new branch] gh/zou3519/1201/orig -> origin/gh/zou3519/1201/orig 2025-12-04T09:17:18.8981563Z * [new branch] gh/zou3519/1202/base -> origin/gh/zou3519/1202/base 2025-12-04T09:17:18.8982852Z * [new branch] gh/zou3519/1202/head -> origin/gh/zou3519/1202/head 2025-12-04T09:17:18.8985147Z * [new branch] gh/zou3519/1202/orig -> origin/gh/zou3519/1202/orig 2025-12-04T09:17:18.8988688Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-12-04T09:17:18.8989794Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-12-04T09:17:18.8992723Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-12-04T09:17:18.8994751Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-12-04T09:17:18.8997071Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-12-04T09:17:18.8999826Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-12-04T09:17:18.9001162Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-12-04T09:17:18.9003345Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-12-04T09:17:18.9006050Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-12-04T09:17:18.9007636Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-12-04T09:17:18.9009977Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-12-04T09:17:18.9012675Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-12-04T09:17:18.9014027Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-12-04T09:17:18.9016154Z * [new branch] gh/zpcore/14/orig -> origin/gh/zpcore/14/orig 2025-12-04T09:17:18.9018892Z * [new branch] gh/zpcore/15/base -> origin/gh/zpcore/15/base 2025-12-04T09:17:18.9021092Z * [new branch] gh/zpcore/15/head -> origin/gh/zpcore/15/head 2025-12-04T09:17:18.9022417Z * [new branch] gh/zpcore/15/orig -> origin/gh/zpcore/15/orig 2025-12-04T09:17:18.9025303Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-12-04T09:17:18.9027334Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-12-04T09:17:18.9030320Z * [new branch] gh/zpcore/21/base -> origin/gh/zpcore/21/base 2025-12-04T09:17:18.9032496Z * [new branch] gh/zpcore/21/head -> origin/gh/zpcore/21/head 2025-12-04T09:17:18.9033633Z * [new branch] gh/zpcore/21/orig -> origin/gh/zpcore/21/orig 2025-12-04T09:17:18.9037293Z * [new branch] gh/zpcore/22/base -> origin/gh/zpcore/22/base 2025-12-04T09:17:18.9038881Z * [new branch] gh/zpcore/22/head -> origin/gh/zpcore/22/head 2025-12-04T09:17:18.9041100Z * [new branch] gh/zpcore/22/orig -> origin/gh/zpcore/22/orig 2025-12-04T09:17:18.9043716Z * [new branch] gh/zpcore/23/base -> origin/gh/zpcore/23/base 2025-12-04T09:17:18.9045275Z * [new branch] gh/zpcore/23/head -> origin/gh/zpcore/23/head 2025-12-04T09:17:18.9047252Z * [new branch] gh/zpcore/23/orig -> origin/gh/zpcore/23/orig 2025-12-04T09:17:18.9049566Z * [new branch] gh/zpcore/24/base -> origin/gh/zpcore/24/base 2025-12-04T09:17:18.9051451Z * [new branch] gh/zpcore/24/head -> origin/gh/zpcore/24/head 2025-12-04T09:17:18.9053228Z * [new branch] gh/zpcore/24/orig -> origin/gh/zpcore/24/orig 2025-12-04T09:17:18.9056070Z * [new branch] gh/zpcore/25/base -> origin/gh/zpcore/25/base 2025-12-04T09:17:18.9057904Z * [new branch] gh/zpcore/25/head -> origin/gh/zpcore/25/head 2025-12-04T09:17:18.9060550Z * [new branch] gh/zpcore/25/orig -> origin/gh/zpcore/25/orig 2025-12-04T09:17:18.9063268Z * [new branch] gh/zpcore/26/base -> origin/gh/zpcore/26/base 2025-12-04T09:17:18.9065304Z * [new branch] gh/zpcore/26/head -> origin/gh/zpcore/26/head 2025-12-04T09:17:18.9067323Z * [new branch] gh/zpcore/26/orig -> origin/gh/zpcore/26/orig 2025-12-04T09:17:18.9069927Z * [new branch] gh/zpcore/27/base -> origin/gh/zpcore/27/base 2025-12-04T09:17:18.9071229Z * [new branch] gh/zpcore/27/head -> origin/gh/zpcore/27/head 2025-12-04T09:17:18.9073727Z * [new branch] gh/zpcore/27/orig -> origin/gh/zpcore/27/orig 2025-12-04T09:17:18.9076899Z * [new branch] gh/zpcore/28/base -> origin/gh/zpcore/28/base 2025-12-04T09:17:18.9078950Z * [new branch] gh/zpcore/28/head -> origin/gh/zpcore/28/head 2025-12-04T09:17:18.9080763Z * [new branch] gh/zpcore/28/orig -> origin/gh/zpcore/28/orig 2025-12-04T09:17:18.9083113Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-12-04T09:17:18.9084974Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-12-04T09:17:18.9087852Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-12-04T09:17:18.9090306Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-12-04T09:17:18.9092533Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-12-04T09:17:18.9094135Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-12-04T09:17:18.9096798Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-12-04T09:17:18.9098095Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-12-04T09:17:18.9101638Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-12-04T09:17:18.9102926Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-12-04T09:17:18.9105730Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-12-04T09:17:18.9108041Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-12-04T09:17:18.9112296Z * [new branch] google-main -> origin/google-main 2025-12-04T09:17:18.9115191Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-12-04T09:17:18.9116129Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-12-04T09:17:18.9119244Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-12-04T09:17:18.9121693Z * [new branch] hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass 2025-12-04T09:17:18.9123254Z * [new branch] hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests 2025-12-04T09:17:18.9125104Z * [new branch] hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose 2025-12-04T09:17:18.9127228Z * [new branch] hc_baseline -> origin/hc_baseline 2025-12-04T09:17:18.9129507Z * [new branch] hhh_rand -> origin/hhh_rand 2025-12-04T09:17:18.9131614Z * [new branch] huba/f1 -> origin/huba/f1 2025-12-04T09:17:18.9134294Z * [new branch] increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test 2025-12-04T09:17:18.9135206Z * [new branch] inlining -> origin/inlining 2025-12-04T09:17:18.9137577Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-12-04T09:17:18.9139937Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-12-04T09:17:18.9141572Z * [new branch] instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters 2025-12-04T09:17:18.9143142Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-12-04T09:17:18.9145473Z * [new branch] issue#58739 -> origin/issue#58739 2025-12-04T09:17:18.9147581Z * [new branch] jainapurva-patch-1 -> origin/jainapurva-patch-1 2025-12-04T09:17:18.9149955Z * [new branch] jathu/o3 -> origin/jathu/o3 2025-12-04T09:17:18.9151669Z * [new branch] jathu/sve -> origin/jathu/sve 2025-12-04T09:17:18.9154411Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-12-04T09:17:18.9155797Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-12-04T09:17:18.9158758Z * [new branch] jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter 2025-12-04T09:17:18.9160105Z * [new branch] jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning 2025-12-04T09:17:18.9162396Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-12-04T09:17:18.9164485Z * [new branch] jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10 2025-12-04T09:17:18.9165904Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-12-04T09:17:18.9168441Z * [new branch] jithunnair-amd-patch-3 -> origin/jithunnair-amd-patch-3 2025-12-04T09:17:18.9170312Z * [new branch] jithunnair-amd-patch-4 -> origin/jithunnair-amd-patch-4 2025-12-04T09:17:18.9171741Z * [new branch] jithunnair-amd-patch-5 -> origin/jithunnair-amd-patch-5 2025-12-04T09:17:18.9174080Z * [new branch] jithunnair-amd-patch-6 -> origin/jithunnair-amd-patch-6 2025-12-04T09:17:18.9175993Z * [new branch] jithunnair-amd-patch-7 -> origin/jithunnair-amd-patch-7 2025-12-04T09:17:18.9177948Z * [new branch] jithunnair-amd-patch-8 -> origin/jithunnair-amd-patch-8 2025-12-04T09:17:18.9180301Z * [new branch] jithunnair-amd-patch-9 -> origin/jithunnair-amd-patch-9 2025-12-04T09:17:18.9182871Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-12-04T09:17:18.9185466Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-12-04T09:17:18.9187268Z * [new branch] kainan_test -> origin/kainan_test 2025-12-04T09:17:18.9189433Z * [new branch] larryliu0820-patch-1 -> origin/larryliu0820-patch-1 2025-12-04T09:17:18.9192013Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-12-04T09:17:18.9194645Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-12-04T09:17:18.9197228Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-12-04T09:17:18.9198478Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-12-04T09:17:18.9200597Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-12-04T09:17:18.9201976Z * [new branch] llama4-stable -> origin/llama4-stable 2025-12-04T09:17:18.9205513Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-12-04T09:17:18.9208281Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-12-04T09:17:18.9209599Z * [new branch] lucaskabela/fix_164876 -> origin/lucaskabela/fix_164876 2025-12-04T09:17:18.9211659Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-12-04T09:17:18.9213045Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-12-04T09:17:18.9215299Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-12-04T09:17:18.9216620Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-12-04T09:17:18.9219317Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-12-04T09:17:18.9222007Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-12-04T09:17:18.9223142Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-12-04T09:17:18.9225478Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-12-04T09:17:18.9226830Z * [new branch] lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager 2025-12-04T09:17:18.9228972Z * [new branch] lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module 2025-12-04T09:17:18.9230452Z * [new branch] lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined 2025-12-04T09:17:18.9232691Z * [new branch] lucaskabela/typing_variables -> origin/lucaskabela/typing_variables 2025-12-04T09:17:18.9234178Z * [new branch] lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts 2025-12-04T09:17:18.9236429Z * [new branch] lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions 2025-12-04T09:17:18.9237941Z * [new branch] lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists 2025-12-04T09:17:18.9240891Z * [new branch] lw/torch_box_by_ref -> origin/lw/torch_box_by_ref 2025-12-04T09:17:18.9242732Z * [new branch] main -> origin/main 2025-12-04T09:17:18.9245205Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-12-04T09:17:18.9247381Z * [new branch] malfet-patch-2 -> origin/malfet-patch-2 2025-12-04T09:17:18.9249484Z * [new branch] malfet-patch-3 -> origin/malfet-patch-3 2025-12-04T09:17:18.9251888Z * [new branch] malfet-patch-4 -> origin/malfet-patch-4 2025-12-04T09:17:18.9254094Z * [new branch] malfet-patch-5 -> origin/malfet-patch-5 2025-12-04T09:17:18.9255596Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-12-04T09:17:18.9257847Z * [new branch] malfet-patch-7 -> origin/malfet-patch-7 2025-12-04T09:17:18.9260171Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-12-04T09:17:18.9263119Z * [new branch] malfet/add-3.14-ci -> origin/malfet/add-3.14-ci 2025-12-04T09:17:18.9264659Z * [new branch] malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts 2025-12-04T09:17:18.9266379Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-12-04T09:17:18.9268876Z * [new branch] malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers 2025-12-04T09:17:18.9270916Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-12-04T09:17:18.9273572Z * [new branch] manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe 2025-12-04T09:17:18.9274949Z * [new branch] manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp 2025-12-04T09:17:18.9278290Z * [new branch] masnesral/metaconda -> origin/masnesral/metaconda 2025-12-04T09:17:18.9280390Z * [new branch] mem_profiler_flaky_fix -> origin/mem_profiler_flaky_fix 2025-12-04T09:17:18.9282364Z * [new branch] mem_profiler_stack_trace -> origin/mem_profiler_stack_trace 2025-12-04T09:17:18.9283997Z * [new branch] memory_profiler_stack -> origin/memory_profiler_stack 2025-12-04T09:17:18.9286306Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-12-04T09:17:18.9287748Z * [new branch] mingw_posix -> origin/mingw_posix 2025-12-04T09:17:18.9291258Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-12-04T09:17:18.9292082Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-12-04T09:17:18.9294053Z * [new branch] mlazos/acts -> origin/mlazos/acts 2025-12-04T09:17:18.9295701Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-12-04T09:17:18.9297707Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-12-04T09:17:18.9299107Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-12-04T09:17:18.9301105Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-12-04T09:17:18.9302893Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-12-04T09:17:18.9304167Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-12-04T09:17:18.9306688Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-12-04T09:17:18.9308912Z * [new branch] mlazos/bwd -> origin/mlazos/bwd 2025-12-04T09:17:18.9310835Z * [new branch] mlazos/combo-test -> origin/mlazos/combo-test 2025-12-04T09:17:18.9312709Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-12-04T09:17:18.9314592Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-12-04T09:17:18.9316547Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-12-04T09:17:18.9318281Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-12-04T09:17:18.9320277Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-12-04T09:17:18.9322421Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-12-04T09:17:18.9324104Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-12-04T09:17:18.9326280Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-12-04T09:17:18.9327637Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-12-04T09:17:18.9329803Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-12-04T09:17:18.9331673Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-12-04T09:17:18.9333513Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-12-04T09:17:18.9335449Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-12-04T09:17:18.9337494Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-12-04T09:17:18.9339499Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-12-04T09:17:18.9341329Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-12-04T09:17:18.9343126Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-12-04T09:17:18.9345028Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-12-04T09:17:18.9346890Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-12-04T09:17:18.9348696Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-12-04T09:17:18.9350533Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-12-04T09:17:18.9352403Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-12-04T09:17:18.9354308Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-12-04T09:17:18.9356426Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-12-04T09:17:18.9357958Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-12-04T09:17:18.9360060Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-12-04T09:17:18.9361877Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-12-04T09:17:18.9363756Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-12-04T09:17:18.9365584Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-12-04T09:17:18.9367551Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-12-04T09:17:18.9369398Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-12-04T09:17:18.9371313Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-12-04T09:17:18.9373144Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-12-04T09:17:18.9374959Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-12-04T09:17:18.9376979Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-12-04T09:17:18.9379256Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-12-04T09:17:18.9381128Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-12-04T09:17:18.9383193Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-12-04T09:17:18.9384719Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-12-04T09:17:18.9386719Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-12-04T09:17:18.9389033Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-12-04T09:17:18.9390865Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-12-04T09:17:18.9392540Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-12-04T09:17:18.9394438Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-12-04T09:17:18.9396339Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-12-04T09:17:18.9398232Z * [new branch] mlazos/inductor-streams -> origin/mlazos/inductor-streams 2025-12-04T09:17:18.9399546Z * [new branch] mlazos/main -> origin/mlazos/main 2025-12-04T09:17:18.9401793Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-12-04T09:17:18.9403702Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-12-04T09:17:18.9406391Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-12-04T09:17:18.9407985Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-12-04T09:17:18.9410214Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-12-04T09:17:18.9412171Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-12-04T09:17:18.9413969Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-12-04T09:17:18.9415585Z * [new branch] mlazos/overguarding -> origin/mlazos/overguarding 2025-12-04T09:17:18.9417866Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-12-04T09:17:18.9419610Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-12-04T09:17:18.9421586Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-12-04T09:17:18.9423442Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-12-04T09:17:18.9425296Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-12-04T09:17:18.9427305Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-12-04T09:17:18.9429177Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-12-04T09:17:18.9431073Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-12-04T09:17:18.9432978Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-12-04T09:17:18.9434316Z * [new branch] mlazos/stests -> origin/mlazos/stests 2025-12-04T09:17:18.9436538Z * [new branch] mlazos/stream-ops -> origin/mlazos/stream-ops 2025-12-04T09:17:18.9438433Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-12-04T09:17:18.9440334Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-12-04T09:17:18.9442165Z * [new branch] mlazos/test -> origin/mlazos/test 2025-12-04T09:17:18.9444054Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-12-04T09:17:18.9446138Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-12-04T09:17:18.9447490Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-12-04T09:17:18.9449804Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-12-04T09:17:18.9451887Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-12-04T09:17:18.9453159Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-12-04T09:17:18.9455346Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-12-04T09:17:18.9457342Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-12-04T09:17:18.9459415Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-12-04T09:17:18.9461073Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-12-04T09:17:18.9463193Z * [new branch] mlazos/user-stream-base -> origin/mlazos/user-stream-base 2025-12-04T09:17:18.9464659Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-12-04T09:17:18.9467099Z * [new branch] mlazos/user-streams-backup -> origin/mlazos/user-streams-backup 2025-12-04T09:17:18.9468439Z * [new branch] mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2 2025-12-04T09:17:18.9470617Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-12-04T09:17:18.9472630Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-12-04T09:17:18.9473974Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-12-04T09:17:18.9476297Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-12-04T09:17:18.9478295Z * [new branch] module-shim -> origin/module-shim 2025-12-04T09:17:18.9480267Z * [new branch] move_config -> origin/move_config 2025-12-04T09:17:18.9483076Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-12-04T09:17:18.9485731Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-12-04T09:17:18.9488335Z * [new branch] mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape 2025-12-04T09:17:18.9490346Z * [new branch] my_varlen_backup -> origin/my_varlen_backup 2025-12-04T09:17:18.9491935Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-12-04T09:17:18.9494171Z * [new branch] new-codegen -> origin/new-codegen 2025-12-04T09:17:18.9495983Z * [new branch] newtest-base -> origin/newtest-base 2025-12-04T09:17:18.9498629Z * [new branch] ngimel/addmm_dtype -> origin/ngimel/addmm_dtype 2025-12-04T09:17:18.9500304Z * [new branch] ngimel/div_inv -> origin/ngimel/div_inv 2025-12-04T09:17:18.9501643Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-12-04T09:17:18.9504079Z * [new branch] ngimel/gather_grid -> origin/ngimel/gather_grid 2025-12-04T09:17:18.9505240Z * [new branch] ngimel/gather_grid_release -> origin/ngimel/gather_grid_release 2025-12-04T09:17:18.9507366Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-12-04T09:17:18.9509799Z * [new branch] ngimel/hostalloc -> origin/ngimel/hostalloc 2025-12-04T09:17:18.9512227Z * [new branch] ngimel/storage_id -> origin/ngimel/storage_id 2025-12-04T09:17:18.9514260Z * [new branch] nightly -> origin/nightly 2025-12-04T09:17:18.9516992Z * [new branch] nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check 2025-12-04T09:17:18.9518441Z * [new branch] nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias 2025-12-04T09:17:18.9520553Z * [new branch] nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor 2025-12-04T09:17:18.9522995Z * [new branch] nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch 2025-12-04T09:17:18.9524381Z * [new branch] nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions 2025-12-04T09:17:18.9526877Z * [new branch] nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index 2025-12-04T09:17:18.9528466Z * [new branch] nikitaved/test -> origin/nikitaved/test 2025-12-04T09:17:18.9531005Z * [new branch] nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune 2025-12-04T09:17:18.9532495Z * [new branch] no_distributed_log_spew -> origin/no_distributed_log_spew 2025-12-04T09:17:18.9534741Z * [new branch] nofun-hack -> origin/nofun-hack 2025-12-04T09:17:18.9536814Z * [new branch] norm_bench -> origin/norm_bench 2025-12-04T09:17:18.9539474Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-12-04T09:17:18.9541548Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-12-04T09:17:18.9543449Z * [new branch] optimizer_test -> origin/optimizer_test 2025-12-04T09:17:18.9546671Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-12-04T09:17:18.9548575Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-12-04T09:17:18.9550430Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-12-04T09:17:18.9552470Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-12-04T09:17:18.9554417Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-12-04T09:17:18.9556392Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-12-04T09:17:18.9558287Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-12-04T09:17:18.9560190Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-12-04T09:17:18.9562057Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-12-04T09:17:18.9563878Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-12-04T09:17:18.9566020Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-12-04T09:17:18.9567398Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-12-04T09:17:18.9569656Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-12-04T09:17:18.9572084Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-12-04T09:17:18.9573396Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-12-04T09:17:18.9576536Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-12-04T09:17:18.9579168Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-12-04T09:17:18.9580950Z * [new branch] orig/release/2.9 -> origin/orig/release/2.9 2025-12-04T09:17:18.9585085Z * [new branch] origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base 2025-12-04T09:17:18.9586484Z * [new branch] origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig 2025-12-04T09:17:18.9589993Z * [new branch] origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig 2025-12-04T09:17:18.9592063Z * [new branch] oulgen-patch-1 -> origin/oulgen-patch-1 2025-12-04T09:17:18.9594329Z * [new branch] oulgen-patch-2 -> origin/oulgen-patch-2 2025-12-04T09:17:18.9596272Z * [new branch] oulgen-patch-3 -> origin/oulgen-patch-3 2025-12-04T09:17:18.9598254Z * [new branch] oulgen-patch-4 -> origin/oulgen-patch-4 2025-12-04T09:17:18.9600246Z * [new branch] padded-tensor -> origin/padded-tensor 2025-12-04T09:17:18.9602143Z * [new branch] pca2 -> origin/pca2 2025-12-04T09:17:18.9604377Z * [new branch] per_channel_backup -> origin/per_channel_backup 2025-12-04T09:17:18.9606398Z * [new branch] perf_ops -> origin/perf_ops 2025-12-04T09:17:18.9608164Z * [new branch] perf_ops_2_9 -> origin/perf_ops_2_9 2025-12-04T09:17:18.9610538Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-12-04T09:17:18.9613109Z * [new branch] pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode 2025-12-04T09:17:18.9614503Z * [new branch] pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft 2025-12-04T09:17:18.9616535Z * [new branch] pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile 2025-12-04T09:17:18.9617922Z * [new branch] pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3 2025-12-04T09:17:18.9620248Z * [new branch] pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft 2025-12-04T09:17:18.9622281Z * [new branch] pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys 2025-12-04T09:17:18.9624419Z * [new branch] pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode 2025-12-04T09:17:18.9626484Z * [new branch] pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size 2025-12-04T09:17:18.9628358Z * [new branch] pianpwk/anomaly_tb -> origin/pianpwk/anomaly_tb 2025-12-04T09:17:18.9629750Z * [new branch] pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate 2025-12-04T09:17:18.9631974Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-12-04T09:17:18.9633365Z * [new branch] pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf 2025-12-04T09:17:18.9635649Z * [new branch] pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces 2025-12-04T09:17:18.9637561Z * [new branch] pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor 2025-12-04T09:17:18.9639433Z * [new branch] pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate 2025-12-04T09:17:18.9641184Z * [new branch] pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults 2025-12-04T09:17:18.9650464Z * [new branch] pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks 2025-12-04T09:17:18.9651329Z * [new branch] pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor 2025-12-04T09:17:18.9652018Z * [new branch] pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids 2025-12-04T09:17:18.9652697Z * [new branch] pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton 2025-12-04T09:17:18.9653331Z * [new branch] pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace 2025-12-04T09:17:18.9654079Z * [new branch] pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective 2025-12-04T09:17:18.9654769Z * [new branch] pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf 2025-12-04T09:17:18.9656135Z * [new branch] pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug 2025-12-04T09:17:18.9658136Z * [new branch] pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile 2025-12-04T09:17:18.9659999Z * [new branch] pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn 2025-12-04T09:17:18.9662384Z * [new branch] pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5 2025-12-04T09:17:18.9663672Z * [new branch] pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk 2025-12-04T09:17:18.9665794Z * [new branch] pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath 2025-12-04T09:17:18.9667842Z * [new branch] pianpwk/event_list_tree -> origin/pianpwk/event_list_tree 2025-12-04T09:17:18.9669627Z * [new branch] pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs 2025-12-04T09:17:18.9671452Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-12-04T09:17:18.9673383Z * [new branch] pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft 2025-12-04T09:17:18.9675850Z * [new branch] pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat 2025-12-04T09:17:18.9677716Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-12-04T09:17:18.9679496Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-12-04T09:17:18.9681416Z * [new branch] pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate 2025-12-04T09:17:18.9683271Z * [new branch] pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards 2025-12-04T09:17:18.9685072Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-12-04T09:17:18.9687139Z * [new branch] pianpwk/symint_one_hot -> origin/pianpwk/symint_one_hot 2025-12-04T09:17:18.9689107Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-12-04T09:17:18.9690877Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-12-04T09:17:18.9692647Z * [new branch] pianpwk/try_dumb_stuff -> origin/pianpwk/try_dumb_stuff 2025-12-04T09:17:18.9694564Z * [new branch] pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2 2025-12-04T09:17:18.9696391Z * [new branch] pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm 2025-12-04T09:17:18.9698263Z * [new branch] pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2 2025-12-04T09:17:18.9700166Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-12-04T09:17:18.9702082Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-12-04T09:17:18.9704593Z * [new branch] piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112 2025-12-04T09:17:18.9706272Z * [new branch] piz/prop_cache_clean -> origin/piz/prop_cache_clean 2025-12-04T09:17:18.9708345Z * [new branch] pool-separate -> origin/pool-separate 2025-12-04T09:17:18.9710447Z * [new branch] pr-156087 -> origin/pr-156087 2025-12-04T09:17:18.9712910Z * [new branch] pr/131860 -> origin/pr/131860 2025-12-04T09:17:18.9715050Z * [new branch] predispatch_to -> origin/predispatch_to 2025-12-04T09:17:18.9716937Z * [new branch] protect-c17 -> origin/protect-c17 2025-12-04T09:17:18.9718861Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-12-04T09:17:18.9721320Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-12-04T09:17:18.9724093Z * [new branch] q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown 2025-12-04T09:17:18.9725739Z * [new branch] q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args 2025-12-04T09:17:18.9728790Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-12-04T09:17:18.9730784Z * [new branch] quote-pytest_cache -> origin/quote-pytest_cache 2025-12-04T09:17:18.9732990Z * [new branch] reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn 2025-12-04T09:17:18.9735780Z * [new branch] release/1.10 -> origin/release/1.10 2025-12-04T09:17:18.9737483Z * [new branch] release/1.11 -> origin/release/1.11 2025-12-04T09:17:18.9739378Z * [new branch] release/1.12 -> origin/release/1.12 2025-12-04T09:17:18.9741272Z * [new branch] release/1.13 -> origin/release/1.13 2025-12-04T09:17:18.9743122Z * [new branch] release/1.4 -> origin/release/1.4 2025-12-04T09:17:18.9744725Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-12-04T09:17:18.9746604Z * [new branch] release/1.5 -> origin/release/1.5 2025-12-04T09:17:18.9748444Z * [new branch] release/1.6 -> origin/release/1.6 2025-12-04T09:17:18.9750902Z * [new branch] release/1.7 -> origin/release/1.7 2025-12-04T09:17:18.9752822Z * [new branch] release/1.8 -> origin/release/1.8 2025-12-04T09:17:18.9754647Z * [new branch] release/1.9 -> origin/release/1.9 2025-12-04T09:17:18.9756491Z * [new branch] release/2.0 -> origin/release/2.0 2025-12-04T09:17:18.9758520Z * [new branch] release/2.1 -> origin/release/2.1 2025-12-04T09:17:18.9760338Z * [new branch] release/2.2 -> origin/release/2.2 2025-12-04T09:17:18.9762528Z * [new branch] release/2.3 -> origin/release/2.3 2025-12-04T09:17:18.9764869Z * [new branch] release/2.4 -> origin/release/2.4 2025-12-04T09:17:18.9767250Z * [new branch] release/2.5 -> origin/release/2.5 2025-12-04T09:17:18.9769378Z * [new branch] release/2.6 -> origin/release/2.6 2025-12-04T09:17:18.9771366Z * [new branch] release/2.7 -> origin/release/2.7 2025-12-04T09:17:18.9773232Z * [new branch] release/2.8 -> origin/release/2.8 2025-12-04T09:17:18.9775388Z * [new branch] release/2.9 -> origin/release/2.9 2025-12-04T09:17:18.9777272Z * [new branch] release_notes -> origin/release_notes 2025-12-04T09:17:18.9779382Z * [new branch] remove_pyinterpreter -> origin/remove_pyinterpreter 2025-12-04T09:17:18.9781693Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-12-04T09:17:18.9783441Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-12-04T09:17:18.9785114Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-12-04T09:17:18.9786987Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-12-04T09:17:18.9790683Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-12-04T09:17:18.9794263Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-12-04T09:17:18.9797982Z * [new branch] revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head 2025-12-04T09:17:18.9801931Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-12-04T09:17:18.9804171Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-12-04T09:17:18.9805934Z * [new branch] revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph 2025-12-04T09:17:18.9807939Z * [new branch] revert_always_build_distributed -> origin/revert_always_build_distributed 2025-12-04T09:17:18.9809885Z * [new branch] rms_norm_patch -> origin/rms_norm_patch 2025-12-04T09:17:18.9812614Z * [new branch] ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation 2025-12-04T09:17:18.9814126Z * [new branch] ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation 2025-12-04T09:17:18.9815987Z * [new branch] ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation 2025-12-04T09:17:18.9817685Z * [new branch] ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing 2025-12-04T09:17:18.9819901Z * [new branch] ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass 2025-12-04T09:17:18.9822057Z * [new branch] ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass 2025-12-04T09:17:18.9824862Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-12-04T09:17:18.9826703Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-12-04T09:17:18.9829203Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-12-04T09:17:18.9830884Z * [new branch] rzou/njt -> origin/rzou/njt 2025-12-04T09:17:18.9832726Z * [new branch] rzou/pca -> origin/rzou/pca 2025-12-04T09:17:18.9834443Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-12-04T09:17:18.9836442Z * [new branch] samplevllm -> origin/samplevllm 2025-12-04T09:17:18.9839384Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-12-04T09:17:18.9841164Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-12-04T09:17:18.9843161Z * [new branch] sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain 2025-12-04T09:17:18.9845062Z * [new branch] save -> origin/save 2025-12-04T09:17:18.9847097Z * [new branch] scaled_mm -> origin/scaled_mm 2025-12-04T09:17:18.9849031Z * [new branch] scan_attempt -> origin/scan_attempt 2025-12-04T09:17:18.9851581Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-12-04T09:17:18.9854052Z * [new branch] sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix 2025-12-04T09:17:18.9856895Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-12-04T09:17:18.9858937Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-12-04T09:17:18.9861042Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-12-04T09:17:18.9862944Z * [new branch] some_rocm_inductor_skips -> origin/some_rocm_inductor_skips 2025-12-04T09:17:18.9865432Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-12-04T09:17:18.9867389Z * [new branch] sparse-mm-bf16-support -> origin/sparse-mm-bf16-support 2025-12-04T09:17:18.9869306Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-12-04T09:17:18.9871255Z * [new branch] suo -> origin/suo 2025-12-04T09:17:18.9873110Z * [new branch] sve-poc -> origin/sve-poc 2025-12-04T09:17:18.9875218Z * [new branch] switch-bn -> origin/switch-bn 2025-12-04T09:17:18.9877197Z * [new branch] sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop 2025-12-04T09:17:18.9879098Z * [new branch] sy_aot_eager_record -> origin/sy_aot_eager_record 2025-12-04T09:17:18.9881895Z * [new branch] sy_custom_bucketing -> origin/sy_custom_bucketing 2025-12-04T09:17:18.9883945Z * [new branch] sy_debug_mode_test -> origin/sy_debug_mode_test 2025-12-04T09:17:18.9885263Z * [new branch] sy_deserialize -> origin/sy_deserialize 2025-12-04T09:17:18.9887252Z * [new branch] sy_dump_gm_code -> origin/sy_dump_gm_code 2025-12-04T09:17:18.9889102Z * [new branch] sy_exp -> origin/sy_exp 2025-12-04T09:17:18.9891105Z * [new branch] sy_export_annotation -> origin/sy_export_annotation 2025-12-04T09:17:18.9893060Z * [new branch] sy_invoke_subgraph -> origin/sy_invoke_subgraph 2025-12-04T09:17:18.9894992Z * [new branch] sy_kernel_bw_name -> origin/sy_kernel_bw_name 2025-12-04T09:17:18.9896866Z * [new branch] sy_multi_arch -> origin/sy_multi_arch 2025-12-04T09:17:18.9898827Z * [new branch] sy_nn_module_stack -> origin/sy_nn_module_stack 2025-12-04T09:17:18.9901001Z * [new branch] sy_original_dtensor -> origin/sy_original_dtensor 2025-12-04T09:17:18.9902911Z * [new branch] sy_profiler_cia -> origin/sy_profiler_cia 2025-12-04T09:17:18.9904767Z * [new branch] symm_mem_sync -> origin/symm_mem_sync 2025-12-04T09:17:18.9906842Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-12-04T09:17:18.9908831Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-12-04T09:17:18.9912762Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-12-04T09:17:18.9914582Z * [new branch] test-old -> origin/test-old 2025-12-04T09:17:18.9917162Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-12-04T09:17:18.9919743Z * [new branch] tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix 2025-12-04T09:17:18.9921615Z * [new branch] tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune 2025-12-04T09:17:18.9923238Z * [new branch] tianren/customOp_fusion -> origin/tianren/customOp_fusion 2025-12-04T09:17:18.9925077Z * [new branch] tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark 2025-12-04T09:17:18.9926940Z * [new branch] tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix 2025-12-04T09:17:18.9929353Z * [new branch] tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config 2025-12-04T09:17:18.9931214Z * [new branch] tianren/dynamic_range_input -> origin/tianren/dynamic_range_input 2025-12-04T09:17:18.9933077Z * [new branch] tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix 2025-12-04T09:17:18.9934923Z * [new branch] tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge 2025-12-04T09:17:18.9936844Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-12-04T09:17:18.9938655Z * [new branch] tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump 2025-12-04T09:17:18.9940672Z * [new branch] tianren/symmetric_memory -> origin/tianren/symmetric_memory 2025-12-04T09:17:18.9942465Z * [new branch] tianren/test -> origin/tianren/test 2025-12-04T09:17:18.9944407Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-12-04T09:17:18.9946307Z * [new branch] tmp -> origin/tmp 2025-12-04T09:17:18.9948259Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-12-04T09:17:18.9950218Z * [new branch] torchtitan_integration -> origin/torchtitan_integration 2025-12-04T09:17:18.9952369Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-12-04T09:17:18.9954106Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-12-04T09:17:18.9956062Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-12-04T09:17:18.9958055Z * [new branch] triton_kernel -> origin/triton_kernel 2025-12-04T09:17:18.9959977Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-12-04T09:17:18.9961887Z * [new branch] type_dec -> origin/type_dec 2025-12-04T09:17:18.9963858Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-12-04T09:17:18.9966706Z * [new branch] update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1 2025-12-04T09:17:18.9968423Z * [new branch] update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1 2025-12-04T09:17:18.9970239Z * [new branch] update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1 2025-12-04T09:17:18.9971997Z * [new branch] update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1 2025-12-04T09:17:18.9973776Z * [new branch] update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1 2025-12-04T09:17:18.9975855Z * [new branch] update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1 2025-12-04T09:17:18.9978431Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-12-04T09:17:18.9981198Z * [new branch] update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1 2025-12-04T09:17:18.9982954Z * [new branch] update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1 2025-12-04T09:17:18.9984567Z * [new branch] update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1 2025-12-04T09:17:18.9986386Z * [new branch] update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1 2025-12-04T09:17:18.9988174Z * [new branch] update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1 2025-12-04T09:17:18.9990713Z * [new branch] update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1 2025-12-04T09:17:18.9992621Z * [new branch] update-vllm-dockerfile -> origin/update-vllm-dockerfile 2025-12-04T09:17:18.9995283Z * [new branch] update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1 2025-12-04T09:17:18.9997060Z * [new branch] update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1 2025-12-04T09:17:18.9998824Z * [new branch] update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1 2025-12-04T09:17:19.0000855Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-12-04T09:17:19.0002660Z * [new branch] update_operator_readme -> origin/update_operator_readme 2025-12-04T09:17:19.0004628Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-12-04T09:17:19.0006594Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-12-04T09:17:19.0008830Z * [new branch] update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677 2025-12-04T09:17:19.0010886Z * [new branch] update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283 2025-12-04T09:17:19.0013250Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-12-04T09:17:19.0014696Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-12-04T09:17:19.0016657Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-12-04T09:17:19.0018541Z * [new branch] upload-tests-for-autorevert -> origin/upload-tests-for-autorevert 2025-12-04T09:17:19.0020664Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-12-04T09:17:19.0022723Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-12-04T09:17:19.0024801Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-12-04T09:17:19.0027059Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-12-04T09:17:19.0029146Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-12-04T09:17:19.0031109Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-12-04T09:17:19.0033109Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-12-04T09:17:19.0035054Z * [new branch] validate_fn -> origin/validate_fn 2025-12-04T09:17:19.0037157Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-12-04T09:17:19.0039143Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-12-04T09:17:19.0041133Z * [new branch] varlen-api -> origin/varlen-api 2025-12-04T09:17:19.0043070Z * [new branch] varlen-api-backup -> origin/varlen-api-backup 2025-12-04T09:17:19.0045472Z * [new branch] varlen_batch_invariance -> origin/varlen_batch_invariance 2025-12-04T09:17:19.0047794Z * [new branch] viable/strict -> origin/viable/strict 2025-12-04T09:17:19.0050542Z * [new branch] vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy 2025-12-04T09:17:19.0052420Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-12-04T09:17:19.0054874Z * [new branch] vllmpin -> origin/vllmpin 2025-12-04T09:17:19.0056957Z * [new branch] vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly 2025-12-04T09:17:19.0059033Z * [new branch] wdvr-patch-1 -> origin/wdvr-patch-1 2025-12-04T09:17:19.0061709Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-12-04T09:17:19.0064193Z * [new branch] whc/pei -> origin/whc/pei 2025-12-04T09:17:19.0065882Z * [new branch] whc/pp_fix -> origin/whc/pp_fix 2025-12-04T09:17:19.0067724Z * [new branch] whc/sharding -> origin/whc/sharding 2025-12-04T09:17:19.0069540Z * [new branch] whc/sharding2 -> origin/whc/sharding2 2025-12-04T09:17:19.0071227Z * [new branch] whc/uneven -> origin/whc/uneven 2025-12-04T09:17:19.0073301Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-12-04T09:17:19.0075215Z * [new branch] win_warnings -> origin/win_warnings 2025-12-04T09:17:19.0077461Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-12-04T09:17:19.0079478Z * [new branch] xmfan-war -> origin/xmfan-war 2025-12-04T09:17:19.0082117Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-12-04T09:17:19.0083886Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-12-04T09:17:19.0085931Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-12-04T09:17:19.0087313Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-12-04T09:17:19.0089024Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-12-04T09:17:19.0090694Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-12-04T09:17:19.0092513Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-12-04T09:17:19.0094613Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-12-04T09:17:19.0096801Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-12-04T09:17:19.0098663Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-12-04T09:17:19.0100645Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-12-04T09:17:19.0102349Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-12-04T09:17:19.0104120Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-12-04T09:17:19.0105958Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-12-04T09:17:19.0107886Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-12-04T09:17:19.0112875Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-12-04T09:17:19.0113227Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-12-04T09:17:19.0113598Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-12-04T09:17:19.0115650Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-12-04T09:17:19.0117497Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-12-04T09:17:19.0119400Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-12-04T09:17:19.0121277Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-12-04T09:17:19.0123324Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T09:17:19.0125286Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T09:17:19.0126828Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-12-04T09:17:19.0128640Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-12-04T09:17:19.0130518Z * [new branch] xmfan/test -> origin/xmfan/test 2025-12-04T09:17:19.0133181Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-12-04T09:17:19.0134882Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-12-04T09:17:19.0136673Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-12-04T09:17:19.0139257Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-12-04T09:17:19.0141253Z * [new branch] yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop 2025-12-04T09:17:19.0143017Z * [new branch] yolo-llama3 -> origin/yolo-llama3 2025-12-04T09:17:19.0145575Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-12-04T09:17:19.0147496Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-12-04T09:17:19.0149183Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-12-04T09:17:19.0150842Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-12-04T09:17:19.0153039Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-12-04T09:17:19.0154847Z * [new branch] zb2p -> origin/zb2p 2025-12-04T09:17:19.0156814Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-12-04T09:17:19.0159904Z * [new branch] zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom 2025-12-04T09:17:19.0161677Z * [new branch] zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom 2025-12-04T09:17:19.0163724Z * [new branch] zhxchen17/ci/vllm_pin -> origin/zhxchen17/ci/vllm_pin 2025-12-04T09:17:19.0166224Z * [new branch] zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards 2025-12-04T09:17:19.0168598Z * [new branch] zhxchen17/export/call_override -> origin/zhxchen17/export/call_override 2025-12-04T09:17:19.0170893Z * [new branch] zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1 2025-12-04T09:17:19.0172726Z * [new branch] zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return 2025-12-04T09:17:19.0174676Z * [new branch] zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn 2025-12-04T09:17:19.0176351Z * [new branch] zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check 2025-12-04T09:17:19.0178826Z * [new branch] zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti 2025-12-04T09:17:19.0180856Z * [new branch] zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals 2025-12-04T09:17:19.0182706Z * [new branch] zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards 2025-12-04T09:17:19.0185031Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-12-04T09:17:19.0186937Z * [new branch] zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update 2025-12-04T09:17:19.0189541Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-12-04T09:17:19.0192095Z * [new branch] zxiiro/build-times -> origin/zxiiro/build-times 2025-12-04T09:17:19.0193944Z * [new branch] zxiiro/c7i.2xlarge -> origin/zxiiro/c7i.2xlarge 2025-12-04T09:17:19.0195755Z * [new branch] zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100 2025-12-04T09:17:19.0197600Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-12-04T09:17:19.0199338Z * [new branch] zxiiro/risc64 -> origin/zxiiro/risc64 2025-12-04T09:17:19.0201171Z * [new branch] zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc 2025-12-04T09:17:19.0202884Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-12-04T09:17:19.0204358Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-12-04T09:17:19.0206064Z * [new tag] ciflow/b200/115316 -> ciflow/b200/115316 2025-12-04T09:17:19.0207277Z * [new tag] ciflow/b200/160685 -> ciflow/b200/160685 2025-12-04T09:17:19.0208702Z * [new tag] ciflow/b200/161607 -> ciflow/b200/161607 2025-12-04T09:17:19.0212311Z * [new tag] ciflow/b200/161938 -> ciflow/b200/161938 2025-12-04T09:17:19.0213628Z * [new tag] ciflow/b200/167207 -> ciflow/b200/167207 2025-12-04T09:17:19.0214869Z * [new tag] ciflow/b200/167989 -> ciflow/b200/167989 2025-12-04T09:17:19.0216276Z * [new tag] ciflow/b200/168096 -> ciflow/b200/168096 2025-12-04T09:17:19.0217718Z * [new tag] ciflow/b200/168175 -> ciflow/b200/168175 2025-12-04T09:17:19.0219172Z * [new tag] ciflow/b200/168195 -> ciflow/b200/168195 2025-12-04T09:17:19.0220563Z * [new tag] ciflow/b200/169200 -> ciflow/b200/169200 2025-12-04T09:17:19.0221900Z * [new tag] ciflow/b200/169216 -> ciflow/b200/169216 2025-12-04T09:17:19.0223657Z * [new tag] ciflow/b200/169380 -> ciflow/b200/169380 2025-12-04T09:17:19.0225478Z * [new tag] ciflow/b200/169412 -> ciflow/b200/169412 2025-12-04T09:17:19.0227018Z * [new tag] ciflow/b200/169470 -> ciflow/b200/169470 2025-12-04T09:17:19.0228837Z * [new tag] ciflow/b200/169471 -> ciflow/b200/169471 2025-12-04T09:17:19.0230280Z * [new tag] ciflow/b200/169472 -> ciflow/b200/169472 2025-12-04T09:17:19.0231757Z * [new tag] ciflow/b200/169514 -> ciflow/b200/169514 2025-12-04T09:17:19.0233051Z * [new tag] ciflow/b200/169517 -> ciflow/b200/169517 2025-12-04T09:17:19.0234750Z * [new tag] ciflow/binaries/165922 -> ciflow/binaries/165922 2025-12-04T09:17:19.0236070Z * [new tag] ciflow/binaries/169510 -> ciflow/binaries/169510 2025-12-04T09:17:19.0237790Z * [new tag] ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994 2025-12-04T09:17:19.0239353Z * [new tag] ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829 2025-12-04T09:17:19.0240487Z * [new tag] ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972 2025-12-04T09:17:19.0242090Z * [new tag] ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981 2025-12-04T09:17:19.0243537Z * [new tag] ciflow/dynamo/167695 -> ciflow/dynamo/167695 2025-12-04T09:17:19.0244752Z * [new tag] ciflow/dynamo/168096 -> ciflow/dynamo/168096 2025-12-04T09:17:19.0246155Z * [new tag] ciflow/dynamo/169525 -> ciflow/dynamo/169525 2025-12-04T09:17:19.0247720Z * [new tag] ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938 2025-12-04T09:17:19.0248721Z * [new tag] ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940 2025-12-04T09:17:19.0250528Z * [new tag] ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923 2025-12-04T09:17:19.0252055Z * [new tag] ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552 2025-12-04T09:17:19.0253160Z * [new tag] ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129 2025-12-04T09:17:19.0254483Z * [new tag] ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917 2025-12-04T09:17:19.0256034Z * [new tag] ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156 2025-12-04T09:17:19.0257279Z * [new tag] ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200 2025-12-04T09:17:19.0258609Z * [new tag] ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216 2025-12-04T09:17:19.0259975Z * [new tag] ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338 2025-12-04T09:17:19.0261380Z * [new tag] ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355 2025-12-04T09:17:19.0262502Z * [new tag] ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543 2025-12-04T09:17:19.0264024Z * [new tag] ciflow/h100/115316 -> ciflow/h100/115316 2025-12-04T09:17:19.0265273Z * [new tag] ciflow/h100/160685 -> ciflow/h100/160685 2025-12-04T09:17:19.0266490Z * [new tag] ciflow/h100/160729 -> ciflow/h100/160729 2025-12-04T09:17:19.0267751Z * [new tag] ciflow/h100/161607 -> ciflow/h100/161607 2025-12-04T09:17:19.0268973Z * [new tag] ciflow/h100/161938 -> ciflow/h100/161938 2025-12-04T09:17:19.0270325Z * [new tag] ciflow/h100/167207 -> ciflow/h100/167207 2025-12-04T09:17:19.0271229Z * [new tag] ciflow/h100/167989 -> ciflow/h100/167989 2025-12-04T09:17:19.0272682Z * [new tag] ciflow/h100/168096 -> ciflow/h100/168096 2025-12-04T09:17:19.0273666Z * [new tag] ciflow/h100/168175 -> ciflow/h100/168175 2025-12-04T09:17:19.0275114Z * [new tag] ciflow/h100/168195 -> ciflow/h100/168195 2025-12-04T09:17:19.0276337Z * [new tag] ciflow/h100/168980 -> ciflow/h100/168980 2025-12-04T09:17:19.0277927Z * [new tag] ciflow/h100/169200 -> ciflow/h100/169200 2025-12-04T09:17:19.0279575Z * [new tag] ciflow/h100/169216 -> ciflow/h100/169216 2025-12-04T09:17:19.0281075Z * [new tag] ciflow/h100/169380 -> ciflow/h100/169380 2025-12-04T09:17:19.0282372Z * [new tag] ciflow/h100/169412 -> ciflow/h100/169412 2025-12-04T09:17:19.0283668Z * [new tag] ciflow/h100/169470 -> ciflow/h100/169470 2025-12-04T09:17:19.0284970Z * [new tag] ciflow/h100/169471 -> ciflow/h100/169471 2025-12-04T09:17:19.0286234Z * [new tag] ciflow/h100/169472 -> ciflow/h100/169472 2025-12-04T09:17:19.0287550Z * [new tag] ciflow/h100/169514 -> ciflow/h100/169514 2025-12-04T09:17:19.0289097Z * [new tag] ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096 2025-12-04T09:17:19.0291036Z * [new tag] ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096 2025-12-04T09:17:19.0292483Z * [new tag] ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165 2025-12-04T09:17:19.0294204Z * [new tag] ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096 2025-12-04T09:17:19.0295835Z * [new tag] ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096 2025-12-04T09:17:19.0297741Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073 2025-12-04T09:17:19.0298756Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096 2025-12-04T09:17:19.0300472Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024 2025-12-04T09:17:19.0302099Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024 2025-12-04T09:17:19.0303180Z * [new tag] ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096 2025-12-04T09:17:19.0305228Z * [new tag] ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096 2025-12-04T09:17:19.0305991Z * [new tag] ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024 2025-12-04T09:17:19.0307541Z * [new tag] ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425 2025-12-04T09:17:19.0309347Z * [new tag] ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545 2025-12-04T09:17:19.0310679Z * [new tag] ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997 2025-12-04T09:17:19.0312461Z * [new tag] ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096 2025-12-04T09:17:19.0313803Z * [new tag] ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063 2025-12-04T09:17:19.0314787Z * [new tag] ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425 2025-12-04T09:17:19.0316720Z * [new tag] ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545 2025-12-04T09:17:19.0317605Z * [new tag] ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096 2025-12-04T09:17:19.0319081Z * [new tag] ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063 2025-12-04T09:17:19.0320062Z * [new tag] ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425 2025-12-04T09:17:19.0321964Z * [new tag] ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052 2025-12-04T09:17:19.0323278Z * [new tag] ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971 2025-12-04T09:17:19.0324786Z * [new tag] ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096 2025-12-04T09:17:19.0326248Z * [new tag] ciflow/inductor/144542 -> ciflow/inductor/144542 2025-12-04T09:17:19.0327468Z * [new tag] ciflow/inductor/146506 -> ciflow/inductor/146506 2025-12-04T09:17:19.0329116Z * [new tag] ciflow/inductor/147990 -> ciflow/inductor/147990 2025-12-04T09:17:19.0330553Z * [new tag] ciflow/inductor/148294 -> ciflow/inductor/148294 2025-12-04T09:17:19.0331815Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-12-04T09:17:19.0333068Z * [new tag] ciflow/inductor/157149 -> ciflow/inductor/157149 2025-12-04T09:17:19.0334354Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-12-04T09:17:19.0335326Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-12-04T09:17:19.0336810Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-12-04T09:17:19.0338123Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-12-04T09:17:19.0339646Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-12-04T09:17:19.0341301Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-12-04T09:17:19.0343000Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-12-04T09:17:19.0344572Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-12-04T09:17:19.0345940Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-12-04T09:17:19.0347194Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-12-04T09:17:19.0348528Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-12-04T09:17:19.0349862Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-12-04T09:17:19.0351175Z * [new tag] ciflow/inductor/162795 -> ciflow/inductor/162795 2025-12-04T09:17:19.0352711Z * [new tag] ciflow/inductor/163245 -> ciflow/inductor/163245 2025-12-04T09:17:19.0354043Z * [new tag] ciflow/inductor/163335 -> ciflow/inductor/163335 2025-12-04T09:17:19.0355362Z * [new tag] ciflow/inductor/163503 -> ciflow/inductor/163503 2025-12-04T09:17:19.0356672Z * [new tag] ciflow/inductor/163942 -> ciflow/inductor/163942 2025-12-04T09:17:19.0358122Z * [new tag] ciflow/inductor/165270 -> ciflow/inductor/165270 2025-12-04T09:17:19.0359435Z * [new tag] ciflow/inductor/165274 -> ciflow/inductor/165274 2025-12-04T09:17:19.0360761Z * [new tag] ciflow/inductor/165322 -> ciflow/inductor/165322 2025-12-04T09:17:19.0362079Z * [new tag] ciflow/inductor/165597 -> ciflow/inductor/165597 2025-12-04T09:17:19.0363379Z * [new tag] ciflow/inductor/166063 -> ciflow/inductor/166063 2025-12-04T09:17:19.0364706Z * [new tag] ciflow/inductor/166075 -> ciflow/inductor/166075 2025-12-04T09:17:19.0366126Z * [new tag] ciflow/inductor/166165 -> ciflow/inductor/166165 2025-12-04T09:17:19.0367555Z * [new tag] ciflow/inductor/166254 -> ciflow/inductor/166254 2025-12-04T09:17:19.0368868Z * [new tag] ciflow/inductor/166483 -> ciflow/inductor/166483 2025-12-04T09:17:19.0370164Z * [new tag] ciflow/inductor/166494 -> ciflow/inductor/166494 2025-12-04T09:17:19.0371424Z * [new tag] ciflow/inductor/166545 -> ciflow/inductor/166545 2025-12-04T09:17:19.0372884Z * [new tag] ciflow/inductor/166788 -> ciflow/inductor/166788 2025-12-04T09:17:19.0374345Z * [new tag] ciflow/inductor/166846 -> ciflow/inductor/166846 2025-12-04T09:17:19.0375687Z * [new tag] ciflow/inductor/167300 -> ciflow/inductor/167300 2025-12-04T09:17:19.0377029Z * [new tag] ciflow/inductor/167407 -> ciflow/inductor/167407 2025-12-04T09:17:19.0378462Z * [new tag] ciflow/inductor/167536 -> ciflow/inductor/167536 2025-12-04T09:17:19.0379898Z * [new tag] ciflow/inductor/167552 -> ciflow/inductor/167552 2025-12-04T09:17:19.0381187Z * [new tag] ciflow/inductor/167555 -> ciflow/inductor/167555 2025-12-04T09:17:19.0382609Z * [new tag] ciflow/inductor/167583 -> ciflow/inductor/167583 2025-12-04T09:17:19.0383903Z * [new tag] ciflow/inductor/167599 -> ciflow/inductor/167599 2025-12-04T09:17:19.0385244Z * [new tag] ciflow/inductor/167647 -> ciflow/inductor/167647 2025-12-04T09:17:19.0386569Z * [new tag] ciflow/inductor/167677 -> ciflow/inductor/167677 2025-12-04T09:17:19.0387886Z * [new tag] ciflow/inductor/167680 -> ciflow/inductor/167680 2025-12-04T09:17:19.0389207Z * [new tag] ciflow/inductor/167695 -> ciflow/inductor/167695 2025-12-04T09:17:19.0390519Z * [new tag] ciflow/inductor/167742 -> ciflow/inductor/167742 2025-12-04T09:17:19.0391814Z * [new tag] ciflow/inductor/167768 -> ciflow/inductor/167768 2025-12-04T09:17:19.0393353Z * [new tag] ciflow/inductor/167773 -> ciflow/inductor/167773 2025-12-04T09:17:19.0394726Z * [new tag] ciflow/inductor/167781 -> ciflow/inductor/167781 2025-12-04T09:17:19.0396020Z * [new tag] ciflow/inductor/167880 -> ciflow/inductor/167880 2025-12-04T09:17:19.0397351Z * [new tag] ciflow/inductor/167887 -> ciflow/inductor/167887 2025-12-04T09:17:19.0399194Z * [new tag] ciflow/inductor/167972 -> ciflow/inductor/167972 2025-12-04T09:17:19.0400492Z * [new tag] ciflow/inductor/167989 -> ciflow/inductor/167989 2025-12-04T09:17:19.0401807Z * [new tag] ciflow/inductor/168002 -> ciflow/inductor/168002 2025-12-04T09:17:19.0403114Z * [new tag] ciflow/inductor/168050 -> ciflow/inductor/168050 2025-12-04T09:17:19.0404470Z * [new tag] ciflow/inductor/168051 -> ciflow/inductor/168051 2025-12-04T09:17:19.0405788Z * [new tag] ciflow/inductor/168052 -> ciflow/inductor/168052 2025-12-04T09:17:19.0407094Z * [new tag] ciflow/inductor/168073 -> ciflow/inductor/168073 2025-12-04T09:17:19.0408190Z * [new tag] ciflow/inductor/168096 -> ciflow/inductor/168096 2025-12-04T09:17:19.0409959Z * [new tag] ciflow/inductor/168114 -> ciflow/inductor/168114 2025-12-04T09:17:19.0411227Z * [new tag] ciflow/inductor/168115 -> ciflow/inductor/168115 2025-12-04T09:17:19.0412535Z * [new tag] ciflow/inductor/168127 -> ciflow/inductor/168127 2025-12-04T09:17:19.0413842Z * [new tag] ciflow/inductor/168129 -> ciflow/inductor/168129 2025-12-04T09:17:19.0415223Z * [new tag] ciflow/inductor/168157 -> ciflow/inductor/168157 2025-12-04T09:17:19.0416729Z * [new tag] ciflow/inductor/168175 -> ciflow/inductor/168175 2025-12-04T09:17:19.0417668Z * [new tag] ciflow/inductor/168185 -> ciflow/inductor/168185 2025-12-04T09:17:19.0419271Z * [new tag] ciflow/inductor/168195 -> ciflow/inductor/168195 2025-12-04T09:17:19.0420707Z * [new tag] ciflow/inductor/168209 -> ciflow/inductor/168209 2025-12-04T09:17:19.0421952Z * [new tag] ciflow/inductor/168266 -> ciflow/inductor/168266 2025-12-04T09:17:19.0423236Z * [new tag] ciflow/inductor/168316 -> ciflow/inductor/168316 2025-12-04T09:17:19.0424721Z * [new tag] ciflow/inductor/168326 -> ciflow/inductor/168326 2025-12-04T09:17:19.0426067Z * [new tag] ciflow/inductor/168368 -> ciflow/inductor/168368 2025-12-04T09:17:19.0427438Z * [new tag] ciflow/inductor/168894 -> ciflow/inductor/168894 2025-12-04T09:17:19.0428780Z * [new tag] ciflow/inductor/168934 -> ciflow/inductor/168934 2025-12-04T09:17:19.0430069Z * [new tag] ciflow/inductor/168939 -> ciflow/inductor/168939 2025-12-04T09:17:19.0431446Z * [new tag] ciflow/inductor/168946 -> ciflow/inductor/168946 2025-12-04T09:17:19.0432706Z * [new tag] ciflow/inductor/168950 -> ciflow/inductor/168950 2025-12-04T09:17:19.0434039Z * [new tag] ciflow/inductor/168951 -> ciflow/inductor/168951 2025-12-04T09:17:19.0435364Z * [new tag] ciflow/inductor/168952 -> ciflow/inductor/168952 2025-12-04T09:17:19.0436662Z * [new tag] ciflow/inductor/168955 -> ciflow/inductor/168955 2025-12-04T09:17:19.0437966Z * [new tag] ciflow/inductor/168971 -> ciflow/inductor/168971 2025-12-04T09:17:19.0439281Z * [new tag] ciflow/inductor/168979 -> ciflow/inductor/168979 2025-12-04T09:17:19.0440603Z * [new tag] ciflow/inductor/168980 -> ciflow/inductor/168980 2025-12-04T09:17:19.0442067Z * [new tag] ciflow/inductor/168983 -> ciflow/inductor/168983 2025-12-04T09:17:19.0443363Z * [new tag] ciflow/inductor/169006 -> ciflow/inductor/169006 2025-12-04T09:17:19.0444754Z * [new tag] ciflow/inductor/169023 -> ciflow/inductor/169023 2025-12-04T09:17:19.0446100Z * [new tag] ciflow/inductor/169024 -> ciflow/inductor/169024 2025-12-04T09:17:19.0447450Z * [new tag] ciflow/inductor/169025 -> ciflow/inductor/169025 2025-12-04T09:17:19.0448753Z * [new tag] ciflow/inductor/169066 -> ciflow/inductor/169066 2025-12-04T09:17:19.0450076Z * [new tag] ciflow/inductor/169091 -> ciflow/inductor/169091 2025-12-04T09:17:19.0451415Z * [new tag] ciflow/inductor/169102 -> ciflow/inductor/169102 2025-12-04T09:17:19.0452705Z * [new tag] ciflow/inductor/169103 -> ciflow/inductor/169103 2025-12-04T09:17:19.0454037Z * [new tag] ciflow/inductor/169121 -> ciflow/inductor/169121 2025-12-04T09:17:19.0455348Z * [new tag] ciflow/inductor/169134 -> ciflow/inductor/169134 2025-12-04T09:17:19.0456658Z * [new tag] ciflow/inductor/169135 -> ciflow/inductor/169135 2025-12-04T09:17:19.0457947Z * [new tag] ciflow/inductor/169141 -> ciflow/inductor/169141 2025-12-04T09:17:19.0459492Z * [new tag] ciflow/inductor/169151 -> ciflow/inductor/169151 2025-12-04T09:17:19.0460997Z * [new tag] ciflow/inductor/169161 -> ciflow/inductor/169161 2025-12-04T09:17:19.0462311Z * [new tag] ciflow/inductor/169167 -> ciflow/inductor/169167 2025-12-04T09:17:19.0463802Z * [new tag] ciflow/inductor/169177 -> ciflow/inductor/169177 2025-12-04T09:17:19.0465398Z * [new tag] ciflow/inductor/169185 -> ciflow/inductor/169185 2025-12-04T09:17:19.0466638Z * [new tag] ciflow/inductor/169196 -> ciflow/inductor/169196 2025-12-04T09:17:19.0467955Z * [new tag] ciflow/inductor/169200 -> ciflow/inductor/169200 2025-12-04T09:17:19.0469261Z * [new tag] ciflow/inductor/169204 -> ciflow/inductor/169204 2025-12-04T09:17:19.0470503Z * [new tag] ciflow/inductor/169216 -> ciflow/inductor/169216 2025-12-04T09:17:19.0471916Z * [new tag] ciflow/inductor/169219 -> ciflow/inductor/169219 2025-12-04T09:17:19.0473232Z * [new tag] ciflow/inductor/169220 -> ciflow/inductor/169220 2025-12-04T09:17:19.0474674Z * [new tag] ciflow/inductor/169230 -> ciflow/inductor/169230 2025-12-04T09:17:19.0475986Z * [new tag] ciflow/inductor/169242 -> ciflow/inductor/169242 2025-12-04T09:17:19.0477309Z * [new tag] ciflow/inductor/169245 -> ciflow/inductor/169245 2025-12-04T09:17:19.0478770Z * [new tag] ciflow/inductor/169260 -> ciflow/inductor/169260 2025-12-04T09:17:19.0480114Z * [new tag] ciflow/inductor/169282 -> ciflow/inductor/169282 2025-12-04T09:17:19.0481422Z * [new tag] ciflow/inductor/169286 -> ciflow/inductor/169286 2025-12-04T09:17:19.0482728Z * [new tag] ciflow/inductor/169299 -> ciflow/inductor/169299 2025-12-04T09:17:19.0484179Z * [new tag] ciflow/inductor/169304 -> ciflow/inductor/169304 2025-12-04T09:17:19.0486413Z * [new tag] ciflow/inductor/169305 -> ciflow/inductor/169305 2025-12-04T09:17:19.0487732Z * [new tag] ciflow/inductor/169308 -> ciflow/inductor/169308 2025-12-04T09:17:19.0489056Z * [new tag] ciflow/inductor/169319 -> ciflow/inductor/169319 2025-12-04T09:17:19.0490411Z * [new tag] ciflow/inductor/169326 -> ciflow/inductor/169326 2025-12-04T09:17:19.0491723Z * [new tag] ciflow/inductor/169332 -> ciflow/inductor/169332 2025-12-04T09:17:19.0493052Z * [new tag] ciflow/inductor/169333 -> ciflow/inductor/169333 2025-12-04T09:17:19.0494571Z * [new tag] ciflow/inductor/169336 -> ciflow/inductor/169336 2025-12-04T09:17:19.0495945Z * [new tag] ciflow/inductor/169340 -> ciflow/inductor/169340 2025-12-04T09:17:19.0497266Z * [new tag] ciflow/inductor/169341 -> ciflow/inductor/169341 2025-12-04T09:17:19.0498588Z * [new tag] ciflow/inductor/169343 -> ciflow/inductor/169343 2025-12-04T09:17:19.0500025Z * [new tag] ciflow/inductor/169346 -> ciflow/inductor/169346 2025-12-04T09:17:19.0501528Z * [new tag] ciflow/inductor/169348 -> ciflow/inductor/169348 2025-12-04T09:17:19.0503230Z * [new tag] ciflow/inductor/169350 -> ciflow/inductor/169350 2025-12-04T09:17:19.0504662Z * [new tag] ciflow/inductor/169355 -> ciflow/inductor/169355 2025-12-04T09:17:19.0506023Z * [new tag] ciflow/inductor/169370 -> ciflow/inductor/169370 2025-12-04T09:17:19.0507944Z * [new tag] ciflow/inductor/169375 -> ciflow/inductor/169375 2025-12-04T09:17:19.0509236Z * [new tag] ciflow/inductor/169389 -> ciflow/inductor/169389 2025-12-04T09:17:19.0510524Z * [new tag] ciflow/inductor/169391 -> ciflow/inductor/169391 2025-12-04T09:17:19.0511828Z * [new tag] ciflow/inductor/169393 -> ciflow/inductor/169393 2025-12-04T09:17:19.0513196Z * [new tag] ciflow/inductor/169399 -> ciflow/inductor/169399 2025-12-04T09:17:19.0514649Z * [new tag] ciflow/inductor/169400 -> ciflow/inductor/169400 2025-12-04T09:17:19.0515960Z * [new tag] ciflow/inductor/169415 -> ciflow/inductor/169415 2025-12-04T09:17:19.0517452Z * [new tag] ciflow/inductor/169417 -> ciflow/inductor/169417 2025-12-04T09:17:19.0518580Z * [new tag] ciflow/inductor/169418 -> ciflow/inductor/169418 2025-12-04T09:17:19.0520211Z * [new tag] ciflow/inductor/169430 -> ciflow/inductor/169430 2025-12-04T09:17:19.0521454Z * [new tag] ciflow/inductor/169432 -> ciflow/inductor/169432 2025-12-04T09:17:19.0522893Z * [new tag] ciflow/inductor/169436 -> ciflow/inductor/169436 2025-12-04T09:17:19.0524330Z * [new tag] ciflow/inductor/169437 -> ciflow/inductor/169437 2025-12-04T09:17:19.0525681Z * [new tag] ciflow/inductor/169438 -> ciflow/inductor/169438 2025-12-04T09:17:19.0527029Z * [new tag] ciflow/inductor/169441 -> ciflow/inductor/169441 2025-12-04T09:17:19.0528339Z * [new tag] ciflow/inductor/169446 -> ciflow/inductor/169446 2025-12-04T09:17:19.0529982Z * [new tag] ciflow/inductor/169447 -> ciflow/inductor/169447 2025-12-04T09:17:19.0531313Z * [new tag] ciflow/inductor/169452 -> ciflow/inductor/169452 2025-12-04T09:17:19.0532791Z * [new tag] ciflow/inductor/169455 -> ciflow/inductor/169455 2025-12-04T09:17:19.0534132Z * [new tag] ciflow/inductor/169459 -> ciflow/inductor/169459 2025-12-04T09:17:19.0535580Z * [new tag] ciflow/inductor/169463 -> ciflow/inductor/169463 2025-12-04T09:17:19.0537066Z * [new tag] ciflow/inductor/169476 -> ciflow/inductor/169476 2025-12-04T09:17:19.0538381Z * [new tag] ciflow/inductor/169485 -> ciflow/inductor/169485 2025-12-04T09:17:19.0539847Z * [new tag] ciflow/inductor/169493 -> ciflow/inductor/169493 2025-12-04T09:17:19.0541156Z * [new tag] ciflow/inductor/169496 -> ciflow/inductor/169496 2025-12-04T09:17:19.0542453Z * [new tag] ciflow/inductor/169497 -> ciflow/inductor/169497 2025-12-04T09:17:19.0543822Z * [new tag] ciflow/inductor/169503 -> ciflow/inductor/169503 2025-12-04T09:17:19.0545168Z * [new tag] ciflow/inductor/169504 -> ciflow/inductor/169504 2025-12-04T09:17:19.0546762Z * [new tag] ciflow/inductor/169505 -> ciflow/inductor/169505 2025-12-04T09:17:19.0548489Z * [new tag] ciflow/inductor/169508 -> ciflow/inductor/169508 2025-12-04T09:17:19.0549914Z * [new tag] ciflow/inductor/169509 -> ciflow/inductor/169509 2025-12-04T09:17:19.0551345Z * [new tag] ciflow/inductor/169513 -> ciflow/inductor/169513 2025-12-04T09:17:19.0552688Z * [new tag] ciflow/inductor/169514 -> ciflow/inductor/169514 2025-12-04T09:17:19.0554010Z * [new tag] ciflow/inductor/169515 -> ciflow/inductor/169515 2025-12-04T09:17:19.0555337Z * [new tag] ciflow/inductor/169517 -> ciflow/inductor/169517 2025-12-04T09:17:19.0556673Z * [new tag] ciflow/inductor/169519 -> ciflow/inductor/169519 2025-12-04T09:17:19.0558017Z * [new tag] ciflow/inductor/169520 -> ciflow/inductor/169520 2025-12-04T09:17:19.0559360Z * [new tag] ciflow/inductor/169521 -> ciflow/inductor/169521 2025-12-04T09:17:19.0560680Z * [new tag] ciflow/inductor/169524 -> ciflow/inductor/169524 2025-12-04T09:17:19.0562063Z * [new tag] ciflow/inductor/169527 -> ciflow/inductor/169527 2025-12-04T09:17:19.0563393Z * [new tag] ciflow/inductor/169528 -> ciflow/inductor/169528 2025-12-04T09:17:19.0564840Z * [new tag] ciflow/inductor/169532 -> ciflow/inductor/169532 2025-12-04T09:17:19.0566170Z * [new tag] ciflow/inductor/169535 -> ciflow/inductor/169535 2025-12-04T09:17:19.0567494Z * [new tag] ciflow/inductor/169536 -> ciflow/inductor/169536 2025-12-04T09:17:19.0568966Z * [new tag] ciflow/inductor/169547 -> ciflow/inductor/169547 2025-12-04T09:17:19.0569905Z * [new tag] ciflow/inductor/169548 -> ciflow/inductor/169548 2025-12-04T09:17:19.0571504Z * [new tag] ciflow/inductor/169549 -> ciflow/inductor/169549 2025-12-04T09:17:19.0572880Z * [new tag] ciflow/inductor/169551 -> ciflow/inductor/169551 2025-12-04T09:17:19.0574180Z * [new tag] ciflow/inductor/169552 -> ciflow/inductor/169552 2025-12-04T09:17:19.0576034Z * [new tag] ciflow/inductor/169553 -> ciflow/inductor/169553 2025-12-04T09:17:19.0577377Z * [new tag] ciflow/inductor/169557 -> ciflow/inductor/169557 2025-12-04T09:17:19.0579103Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-12-04T09:17:19.0580830Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-12-04T09:17:19.0582315Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-12-04T09:17:19.0583907Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-12-04T09:17:19.0585024Z * [new tag] ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075 2025-12-04T09:17:19.0586327Z * [new tag] ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876 2025-12-04T09:17:19.0587446Z * [new tag] ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981 2025-12-04T09:17:19.0589055Z * [new tag] ciflow/mps/166254 -> ciflow/mps/166254 2025-12-04T09:17:19.0590422Z * [new tag] ciflow/mps/169017 -> ciflow/mps/169017 2025-12-04T09:17:19.0591895Z * [new tag] ciflow/mps/169372 -> ciflow/mps/169372 2025-12-04T09:17:19.0593124Z * [new tag] ciflow/mps/169478 -> ciflow/mps/169478 2025-12-04T09:17:19.0594717Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-12-04T09:17:19.0596456Z * [new tag] ciflow/op-benchmark/166075 -> ciflow/op-benchmark/166075 2025-12-04T09:17:19.0597386Z * [new tag] ciflow/op-benchmark/169544 -> ciflow/op-benchmark/169544 2025-12-04T09:17:19.0599248Z * [new tag] ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997 2025-12-04T09:17:19.0600650Z * [new tag] ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517 2025-12-04T09:17:19.0601827Z * [new tag] ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063 2025-12-04T09:17:19.0603206Z * [new tag] ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425 2025-12-04T09:17:19.0604732Z * [new tag] ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517 2025-12-04T09:17:19.0606029Z * [new tag] ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063 2025-12-04T09:17:19.0607022Z * [new tag] ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425 2025-12-04T09:17:19.0609270Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-12-04T09:17:19.0610390Z * [new tag] ciflow/periodic/167207 -> ciflow/periodic/167207 2025-12-04T09:17:19.0611872Z * [new tag] ciflow/periodic/167978 -> ciflow/periodic/167978 2025-12-04T09:17:19.0613105Z * [new tag] ciflow/periodic/168096 -> ciflow/periodic/168096 2025-12-04T09:17:19.0614315Z * [new tag] ciflow/periodic/169286 -> ciflow/periodic/169286 2025-12-04T09:17:19.0615772Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-12-04T09:17:19.0617209Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-12-04T09:17:19.0618747Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-12-04T09:17:19.0620156Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-12-04T09:17:19.0622254Z * [new tag] ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T09:17:19.0623876Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-12-04T09:17:19.0625681Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-12-04T09:17:19.0627172Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-12-04T09:17:19.0628612Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-12-04T09:17:19.0630187Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-12-04T09:17:19.0631922Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-12-04T09:17:19.0633493Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-12-04T09:17:19.0634974Z * [new tag] ciflow/pull/167207 -> ciflow/pull/167207 2025-12-04T09:17:19.0636846Z * [new tag] ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207 2025-12-04T09:17:19.0638319Z * [new tag] ciflow/rocm-mi200/165545 -> ciflow/rocm-mi200/165545 2025-12-04T09:17:19.0639545Z * [new tag] ciflow/rocm-mi200/165997 -> ciflow/rocm-mi200/165997 2025-12-04T09:17:19.0640761Z * [new tag] ciflow/rocm-mi200/168096 -> ciflow/rocm-mi200/168096 2025-12-04T09:17:19.0642187Z * [new tag] ciflow/rocm-mi200/168275 -> ciflow/rocm-mi200/168275 2025-12-04T09:17:19.0643414Z * [new tag] ciflow/rocm-mi200/169063 -> ciflow/rocm-mi200/169063 2025-12-04T09:17:19.0644808Z * [new tag] ciflow/rocm-mi200/169356 -> ciflow/rocm-mi200/169356 2025-12-04T09:17:19.0645898Z * [new tag] ciflow/rocm-mi200/169425 -> ciflow/rocm-mi200/169425 2025-12-04T09:17:19.0647549Z * [new tag] ciflow/rocm-mi300/165545 -> ciflow/rocm-mi300/165545 2025-12-04T09:17:19.0649006Z * [new tag] ciflow/rocm-mi300/167157 -> ciflow/rocm-mi300/167157 2025-12-04T09:17:19.0650228Z * [new tag] ciflow/rocm-mi300/168096 -> ciflow/rocm-mi300/168096 2025-12-04T09:17:19.0651455Z * [new tag] ciflow/rocm-mi300/169063 -> ciflow/rocm-mi300/169063 2025-12-04T09:17:19.0652537Z * [new tag] ciflow/rocm-mi300/169425 -> ciflow/rocm-mi300/169425 2025-12-04T09:17:19.0654175Z * [new tag] ciflow/rocm-mi355/167157 -> ciflow/rocm-mi355/167157 2025-12-04T09:17:19.0655501Z * [new tag] ciflow/rocm-mi355/168275 -> ciflow/rocm-mi355/168275 2025-12-04T09:17:19.0656737Z * [new tag] ciflow/rocm-mi355/169425 -> ciflow/rocm-mi355/169425 2025-12-04T09:17:19.0658302Z * [new tag] ciflow/rocm-navi31/168275 -> ciflow/rocm-navi31/168275 2025-12-04T09:17:19.0659610Z * [new tag] ciflow/rocm-navi31/169425 -> ciflow/rocm-navi31/169425 2025-12-04T09:17:19.0661121Z * [new tag] ciflow/rocm/115316 -> ciflow/rocm/115316 2025-12-04T09:17:19.0662348Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-12-04T09:17:19.0663583Z * [new tag] ciflow/rocm/160685 -> ciflow/rocm/160685 2025-12-04T09:17:19.0664808Z * [new tag] ciflow/rocm/161607 -> ciflow/rocm/161607 2025-12-04T09:17:19.0666108Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-12-04T09:17:19.0667332Z * [new tag] ciflow/rocm/165997 -> ciflow/rocm/165997 2025-12-04T09:17:19.0668697Z * [new tag] ciflow/rocm/166165 -> ciflow/rocm/166165 2025-12-04T09:17:19.0669632Z * [new tag] ciflow/rocm/166517 -> ciflow/rocm/166517 2025-12-04T09:17:19.0671082Z * [new tag] ciflow/rocm/167207 -> ciflow/rocm/167207 2025-12-04T09:17:19.0672316Z * [new tag] ciflow/rocm/167536 -> ciflow/rocm/167536 2025-12-04T09:17:19.0673314Z * [new tag] ciflow/rocm/167781 -> ciflow/rocm/167781 2025-12-04T09:17:19.0675126Z * [new tag] ciflow/rocm/167989 -> ciflow/rocm/167989 2025-12-04T09:17:19.0676818Z * [new tag] ciflow/rocm/168073 -> ciflow/rocm/168073 2025-12-04T09:17:19.0678368Z * [new tag] ciflow/rocm/168195 -> ciflow/rocm/168195 2025-12-04T09:17:19.0679706Z * [new tag] ciflow/rocm/168939 -> ciflow/rocm/168939 2025-12-04T09:17:19.0681001Z * [new tag] ciflow/rocm/168971 -> ciflow/rocm/168971 2025-12-04T09:17:19.0682309Z * [new tag] ciflow/rocm/169024 -> ciflow/rocm/169024 2025-12-04T09:17:19.0683597Z * [new tag] ciflow/rocm/169200 -> ciflow/rocm/169200 2025-12-04T09:17:19.0684880Z * [new tag] ciflow/rocm/169216 -> ciflow/rocm/169216 2025-12-04T09:17:19.0686183Z * [new tag] ciflow/rocm/169312 -> ciflow/rocm/169312 2025-12-04T09:17:19.0687492Z * [new tag] ciflow/rocm/169380 -> ciflow/rocm/169380 2025-12-04T09:17:19.0688858Z * [new tag] ciflow/rocm/169427 -> ciflow/rocm/169427 2025-12-04T09:17:19.0690163Z * [new tag] ciflow/rocm/169455 -> ciflow/rocm/169455 2025-12-04T09:17:19.0691439Z * [new tag] ciflow/rocm/169470 -> ciflow/rocm/169470 2025-12-04T09:17:19.0692734Z * [new tag] ciflow/rocm/169471 -> ciflow/rocm/169471 2025-12-04T09:17:19.0694048Z * [new tag] ciflow/rocm/169472 -> ciflow/rocm/169472 2025-12-04T09:17:19.0695350Z * [new tag] ciflow/rocm/169514 -> ciflow/rocm/169514 2025-12-04T09:17:19.0697053Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-12-04T09:17:19.0698411Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-12-04T09:17:19.0700455Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-12-04T09:17:19.0701262Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-12-04T09:17:19.0702726Z * [new tag] ciflow/slow/167207 -> ciflow/slow/167207 2025-12-04T09:17:19.0704491Z * [new tag] ciflow/slow/168050 -> ciflow/slow/168050 2025-12-04T09:17:19.0705927Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-12-04T09:17:19.0707433Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-12-04T09:17:19.0711509Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-12-04T09:17:19.0713284Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-12-04T09:17:19.0714944Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-12-04T09:17:19.0716434Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-12-04T09:17:19.0723966Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-12-04T09:17:19.0724367Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-12-04T09:17:19.0724961Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-12-04T09:17:19.0725141Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-12-04T09:17:19.0725471Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-12-04T09:17:19.0725648Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-12-04T09:17:19.0726808Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-12-04T09:17:19.0728443Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-12-04T09:17:19.0730448Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-12-04T09:17:19.0731564Z * [new tag] ciflow/torchbench/168175 -> ciflow/torchbench/168175 2025-12-04T09:17:19.0733237Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-12-04T09:17:19.0734320Z * [new tag] ciflow/trunk/157149 -> ciflow/trunk/157149 2025-12-04T09:17:19.0735675Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-12-04T09:17:19.0736879Z * [new tag] ciflow/trunk/159718 -> ciflow/trunk/159718 2025-12-04T09:17:19.0738156Z * [new tag] ciflow/trunk/160685 -> ciflow/trunk/160685 2025-12-04T09:17:19.0739286Z * [new tag] ciflow/trunk/160729 -> ciflow/trunk/160729 2025-12-04T09:17:19.0740720Z * [new tag] ciflow/trunk/162275 -> ciflow/trunk/162275 2025-12-04T09:17:19.0742431Z * [new tag] ciflow/trunk/162795 -> ciflow/trunk/162795 2025-12-04T09:17:19.0743735Z * [new tag] ciflow/trunk/163245 -> ciflow/trunk/163245 2025-12-04T09:17:19.0744813Z * [new tag] ciflow/trunk/163942 -> ciflow/trunk/163942 2025-12-04T09:17:19.0746181Z * [new tag] ciflow/trunk/165274 -> ciflow/trunk/165274 2025-12-04T09:17:19.0747919Z * [new tag] ciflow/trunk/165483 -> ciflow/trunk/165483 2025-12-04T09:17:19.0749632Z * [new tag] ciflow/trunk/165728 -> ciflow/trunk/165728 2025-12-04T09:17:19.0751088Z * [new tag] ciflow/trunk/165922 -> ciflow/trunk/165922 2025-12-04T09:17:19.0752415Z * [new tag] ciflow/trunk/166075 -> ciflow/trunk/166075 2025-12-04T09:17:19.0753725Z * [new tag] ciflow/trunk/166165 -> ciflow/trunk/166165 2025-12-04T09:17:19.0755247Z * [new tag] ciflow/trunk/166829 -> ciflow/trunk/166829 2025-12-04T09:17:19.0756667Z * [new tag] ciflow/trunk/166843 -> ciflow/trunk/166843 2025-12-04T09:17:19.0757991Z * [new tag] ciflow/trunk/166876 -> ciflow/trunk/166876 2025-12-04T09:17:19.0759301Z * [new tag] ciflow/trunk/167207 -> ciflow/trunk/167207 2025-12-04T09:17:19.0760679Z * [new tag] ciflow/trunk/167536 -> ciflow/trunk/167536 2025-12-04T09:17:19.0761942Z * [new tag] ciflow/trunk/167552 -> ciflow/trunk/167552 2025-12-04T09:17:19.0763270Z * [new tag] ciflow/trunk/167555 -> ciflow/trunk/167555 2025-12-04T09:17:19.0764650Z * [new tag] ciflow/trunk/167599 -> ciflow/trunk/167599 2025-12-04T09:17:19.0766051Z * [new tag] ciflow/trunk/167659 -> ciflow/trunk/167659 2025-12-04T09:17:19.0767444Z * [new tag] ciflow/trunk/167672 -> ciflow/trunk/167672 2025-12-04T09:17:19.0768768Z * [new tag] ciflow/trunk/167742 -> ciflow/trunk/167742 2025-12-04T09:17:19.0770080Z * [new tag] ciflow/trunk/167781 -> ciflow/trunk/167781 2025-12-04T09:17:19.0771623Z * [new tag] ciflow/trunk/167837 -> ciflow/trunk/167837 2025-12-04T09:17:19.0772898Z * [new tag] ciflow/trunk/167887 -> ciflow/trunk/167887 2025-12-04T09:17:19.0774212Z * [new tag] ciflow/trunk/167978 -> ciflow/trunk/167978 2025-12-04T09:17:19.0775659Z * [new tag] ciflow/trunk/168050 -> ciflow/trunk/168050 2025-12-04T09:17:19.0776926Z * [new tag] ciflow/trunk/168051 -> ciflow/trunk/168051 2025-12-04T09:17:19.0778185Z * [new tag] ciflow/trunk/168096 -> ciflow/trunk/168096 2025-12-04T09:17:19.0779598Z * [new tag] ciflow/trunk/168127 -> ciflow/trunk/168127 2025-12-04T09:17:19.0780949Z * [new tag] ciflow/trunk/168157 -> ciflow/trunk/168157 2025-12-04T09:17:19.0782276Z * [new tag] ciflow/trunk/168175 -> ciflow/trunk/168175 2025-12-04T09:17:19.0783546Z * [new tag] ciflow/trunk/168209 -> ciflow/trunk/168209 2025-12-04T09:17:19.0785015Z * [new tag] ciflow/trunk/168213 -> ciflow/trunk/168213 2025-12-04T09:17:19.0786478Z * [new tag] ciflow/trunk/168226 -> ciflow/trunk/168226 2025-12-04T09:17:19.0787899Z * [new tag] ciflow/trunk/168262 -> ciflow/trunk/168262 2025-12-04T09:17:19.0789136Z * [new tag] ciflow/trunk/168275 -> ciflow/trunk/168275 2025-12-04T09:17:19.0790569Z * [new tag] ciflow/trunk/168328 -> ciflow/trunk/168328 2025-12-04T09:17:19.0791884Z * [new tag] ciflow/trunk/168368 -> ciflow/trunk/168368 2025-12-04T09:17:19.0793202Z * [new tag] ciflow/trunk/168917 -> ciflow/trunk/168917 2025-12-04T09:17:19.0794527Z * [new tag] ciflow/trunk/168933 -> ciflow/trunk/168933 2025-12-04T09:17:19.0796029Z * [new tag] ciflow/trunk/168941 -> ciflow/trunk/168941 2025-12-04T09:17:19.0797345Z * [new tag] ciflow/trunk/168955 -> ciflow/trunk/168955 2025-12-04T09:17:19.0798772Z * [new tag] ciflow/trunk/168980 -> ciflow/trunk/168980 2025-12-04T09:17:19.0800321Z * [new tag] ciflow/trunk/169004 -> ciflow/trunk/169004 2025-12-04T09:17:19.0801609Z * [new tag] ciflow/trunk/169006 -> ciflow/trunk/169006 2025-12-04T09:17:19.0802921Z * [new tag] ciflow/trunk/169023 -> ciflow/trunk/169023 2025-12-04T09:17:19.0804252Z * [new tag] ciflow/trunk/169025 -> ciflow/trunk/169025 2025-12-04T09:17:19.0805577Z * [new tag] ciflow/trunk/169048 -> ciflow/trunk/169048 2025-12-04T09:17:19.0806905Z * [new tag] ciflow/trunk/169066 -> ciflow/trunk/169066 2025-12-04T09:17:19.0808406Z * [new tag] ciflow/trunk/169091 -> ciflow/trunk/169091 2025-12-04T09:17:19.0809797Z * [new tag] ciflow/trunk/169102 -> ciflow/trunk/169102 2025-12-04T09:17:19.0811070Z * [new tag] ciflow/trunk/169103 -> ciflow/trunk/169103 2025-12-04T09:17:19.0812524Z * [new tag] ciflow/trunk/169125 -> ciflow/trunk/169125 2025-12-04T09:17:19.0814015Z * [new tag] ciflow/trunk/169139 -> ciflow/trunk/169139 2025-12-04T09:17:19.0815433Z * [new tag] ciflow/trunk/169148 -> ciflow/trunk/169148 2025-12-04T09:17:19.0816766Z * [new tag] ciflow/trunk/169151 -> ciflow/trunk/169151 2025-12-04T09:17:19.0818164Z * [new tag] ciflow/trunk/169156 -> ciflow/trunk/169156 2025-12-04T09:17:19.0819689Z * [new tag] ciflow/trunk/169176 -> ciflow/trunk/169176 2025-12-04T09:17:19.0821034Z * [new tag] ciflow/trunk/169204 -> ciflow/trunk/169204 2025-12-04T09:17:19.0822328Z * [new tag] ciflow/trunk/169207 -> ciflow/trunk/169207 2025-12-04T09:17:19.0823650Z * [new tag] ciflow/trunk/169211 -> ciflow/trunk/169211 2025-12-04T09:17:19.0825199Z * [new tag] ciflow/trunk/169231 -> ciflow/trunk/169231 2025-12-04T09:17:19.0826692Z * [new tag] ciflow/trunk/169260 -> ciflow/trunk/169260 2025-12-04T09:17:19.0828165Z * [new tag] ciflow/trunk/169271 -> ciflow/trunk/169271 2025-12-04T09:17:19.0829477Z * [new tag] ciflow/trunk/169280 -> ciflow/trunk/169280 2025-12-04T09:17:19.0831484Z * [new tag] ciflow/trunk/169281 -> ciflow/trunk/169281 2025-12-04T09:17:19.0832727Z * [new tag] ciflow/trunk/169286 -> ciflow/trunk/169286 2025-12-04T09:17:19.0834341Z * [new tag] ciflow/trunk/169293 -> ciflow/trunk/169293 2025-12-04T09:17:19.0835666Z * [new tag] ciflow/trunk/169296 -> ciflow/trunk/169296 2025-12-04T09:17:19.0837052Z * [new tag] ciflow/trunk/169304 -> ciflow/trunk/169304 2025-12-04T09:17:19.0838379Z * [new tag] ciflow/trunk/169305 -> ciflow/trunk/169305 2025-12-04T09:17:19.0839716Z * [new tag] ciflow/trunk/169312 -> ciflow/trunk/169312 2025-12-04T09:17:19.0841296Z * [new tag] ciflow/trunk/169328 -> ciflow/trunk/169328 2025-12-04T09:17:19.0842605Z * [new tag] ciflow/trunk/169343 -> ciflow/trunk/169343 2025-12-04T09:17:19.0844028Z * [new tag] ciflow/trunk/169355 -> ciflow/trunk/169355 2025-12-04T09:17:19.0845349Z * [new tag] ciflow/trunk/169370 -> ciflow/trunk/169370 2025-12-04T09:17:19.0846805Z * [new tag] ciflow/trunk/169379 -> ciflow/trunk/169379 2025-12-04T09:17:19.0848164Z * [new tag] ciflow/trunk/169380 -> ciflow/trunk/169380 2025-12-04T09:17:19.0849465Z * [new tag] ciflow/trunk/169385 -> ciflow/trunk/169385 2025-12-04T09:17:19.0850790Z * [new tag] ciflow/trunk/169387 -> ciflow/trunk/169387 2025-12-04T09:17:19.0852295Z * [new tag] ciflow/trunk/169410 -> ciflow/trunk/169410 2025-12-04T09:17:19.0853664Z * [new tag] ciflow/trunk/169412 -> ciflow/trunk/169412 2025-12-04T09:17:19.0854973Z * [new tag] ciflow/trunk/169418 -> ciflow/trunk/169418 2025-12-04T09:17:19.0856272Z * [new tag] ciflow/trunk/169423 -> ciflow/trunk/169423 2025-12-04T09:17:19.0857639Z * [new tag] ciflow/trunk/169427 -> ciflow/trunk/169427 2025-12-04T09:17:19.0859015Z * [new tag] ciflow/trunk/169430 -> ciflow/trunk/169430 2025-12-04T09:17:19.0860373Z * [new tag] ciflow/trunk/169437 -> ciflow/trunk/169437 2025-12-04T09:17:19.0861706Z * [new tag] ciflow/trunk/169442 -> ciflow/trunk/169442 2025-12-04T09:17:19.0863026Z * [new tag] ciflow/trunk/169452 -> ciflow/trunk/169452 2025-12-04T09:17:19.0864333Z * [new tag] ciflow/trunk/169454 -> ciflow/trunk/169454 2025-12-04T09:17:19.0865640Z * [new tag] ciflow/trunk/169459 -> ciflow/trunk/169459 2025-12-04T09:17:19.0867088Z * [new tag] ciflow/trunk/169474 -> ciflow/trunk/169474 2025-12-04T09:17:19.0868444Z * [new tag] ciflow/trunk/169475 -> ciflow/trunk/169475 2025-12-04T09:17:19.0869739Z * [new tag] ciflow/trunk/169476 -> ciflow/trunk/169476 2025-12-04T09:17:19.0871237Z * [new tag] ciflow/trunk/169487 -> ciflow/trunk/169487 2025-12-04T09:17:19.0872547Z * [new tag] ciflow/trunk/169497 -> ciflow/trunk/169497 2025-12-04T09:17:19.0873883Z * [new tag] ciflow/trunk/169503 -> ciflow/trunk/169503 2025-12-04T09:17:19.0875195Z * [new tag] ciflow/trunk/169505 -> ciflow/trunk/169505 2025-12-04T09:17:19.0876571Z * [new tag] ciflow/trunk/169507 -> ciflow/trunk/169507 2025-12-04T09:17:19.0877845Z * [new tag] ciflow/trunk/169514 -> ciflow/trunk/169514 2025-12-04T09:17:19.0879309Z * [new tag] ciflow/trunk/169517 -> ciflow/trunk/169517 2025-12-04T09:17:19.0880552Z * [new tag] ciflow/trunk/169519 -> ciflow/trunk/169519 2025-12-04T09:17:19.0881842Z * [new tag] ciflow/trunk/169528 -> ciflow/trunk/169528 2025-12-04T09:17:19.0883065Z * [new tag] ciflow/trunk/169541 -> ciflow/trunk/169541 2025-12-04T09:17:19.0884610Z * [new tag] ciflow/trunk/169555 -> ciflow/trunk/169555 2025-12-04T09:17:19.0886471Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-12-04T09:17:19.0888057Z * [new tag] ciflow/vllm/165270 -> ciflow/vllm/165270 2025-12-04T09:17:19.0889314Z * [new tag] ciflow/vllm/165274 -> ciflow/vllm/165274 2025-12-04T09:17:19.0890565Z * [new tag] ciflow/vllm/166494 -> ciflow/vllm/166494 2025-12-04T09:17:19.0891831Z * [new tag] ciflow/vllm/169219 -> ciflow/vllm/169219 2025-12-04T09:17:19.0893060Z * [new tag] ciflow/vllm/169220 -> ciflow/vllm/169220 2025-12-04T09:17:19.0894648Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-12-04T09:17:19.0895901Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-12-04T09:17:19.0897215Z * [new tag] ciflow/xpu/161940 -> ciflow/xpu/161940 2025-12-04T09:17:19.0898554Z * [new tag] ciflow/xpu/163251 -> ciflow/xpu/163251 2025-12-04T09:17:19.0899911Z * [new tag] ciflow/xpu/166829 -> ciflow/xpu/166829 2025-12-04T09:17:19.0901125Z * [new tag] ciflow/xpu/166843 -> ciflow/xpu/166843 2025-12-04T09:17:19.0902434Z * [new tag] ciflow/xpu/167972 -> ciflow/xpu/167972 2025-12-04T09:17:19.0903506Z * [new tag] ciflow/xpu/167981 -> ciflow/xpu/167981 2025-12-04T09:17:19.0904863Z * [new tag] ciflow/xpu/168213 -> ciflow/xpu/168213 2025-12-04T09:17:19.0906118Z * [new tag] ciflow/xpu/168262 -> ciflow/xpu/168262 2025-12-04T09:17:19.0907420Z * [new tag] ciflow/xpu/168328 -> ciflow/xpu/168328 2025-12-04T09:17:19.0909237Z * [new tag] ciflow/xpu/168950 -> ciflow/xpu/168950 2025-12-04T09:17:19.0910984Z * [new tag] ciflow/xpu/169039 -> ciflow/xpu/169039 2025-12-04T09:17:19.0912510Z * [new tag] ciflow/xpu/169200 -> ciflow/xpu/169200 2025-12-04T09:17:19.0913899Z * [new tag] ciflow/xpu/169203 -> ciflow/xpu/169203 2025-12-04T09:17:19.0915175Z * [new tag] ciflow/xpu/169230 -> ciflow/xpu/169230 2025-12-04T09:17:19.0916494Z * [new tag] ciflow/xpu/169231 -> ciflow/xpu/169231 2025-12-04T09:17:19.0917957Z * [new tag] ciflow/xpu/169241 -> ciflow/xpu/169241 2025-12-04T09:17:19.0919322Z * [new tag] ciflow/xpu/169280 -> ciflow/xpu/169280 2025-12-04T09:17:19.0920664Z * [new tag] ciflow/xpu/169296 -> ciflow/xpu/169296 2025-12-04T09:17:19.0922124Z * [new tag] ciflow/xpu/169353 -> ciflow/xpu/169353 2025-12-04T09:17:19.0923460Z * [new tag] ciflow/xpu/169410 -> ciflow/xpu/169410 2025-12-04T09:17:19.0924802Z * [new tag] ciflow/xpu/169442 -> ciflow/xpu/169442 2025-12-04T09:17:19.0926159Z * [new tag] ciflow/xpu/169555 -> ciflow/xpu/169555 2025-12-04T09:17:19.0927631Z * [new tag] cslpull75 -> cslpull75 2025-12-04T09:17:19.0929164Z * [new tag] cslpull76 -> cslpull76 2025-12-04T09:17:19.0930528Z * [new tag] cslpull77 -> cslpull77 2025-12-04T09:17:19.0931950Z * [new tag] cslpull78 -> cslpull78 2025-12-04T09:17:19.0933421Z * [new tag] cslpull79 -> cslpull79 2025-12-04T09:17:19.0935181Z * [new tag] cslpull80 -> cslpull80 2025-12-04T09:17:19.0936629Z * [new tag] cslpull81 -> cslpull81 2025-12-04T09:17:19.0938027Z * [new tag] cslpull82 -> cslpull82 2025-12-04T09:17:19.0939522Z * [new tag] cslpull83 -> cslpull83 2025-12-04T09:17:19.0940903Z * [new tag] cslpull84 -> cslpull84 2025-12-04T09:17:19.0942248Z * [new tag] cslpull85 -> cslpull85 2025-12-04T09:17:19.0943661Z * [new tag] cslpull86 -> cslpull86 2025-12-04T09:17:19.0945026Z * [new tag] cslpull87 -> cslpull87 2025-12-04T09:17:19.0946458Z * [new tag] cslpull88 -> cslpull88 2025-12-04T09:17:19.0947878Z * [new tag] cslpull89 -> cslpull89 2025-12-04T09:17:19.0949053Z * [new tag] cslpull90 -> cslpull90 2025-12-04T09:17:19.0950863Z * [new tag] cslpull91 -> cslpull91 2025-12-04T09:17:19.0952205Z * [new tag] cslpull92 -> cslpull92 2025-12-04T09:17:19.0953754Z * [new tag] flight_5 -> flight_5 2025-12-04T09:17:19.0955293Z * [new tag] flight_5.1 -> flight_5.1 2025-12-04T09:17:19.0956698Z * [new tag] flight_5.2 -> flight_5.2 2025-12-04T09:17:19.0958136Z * [new tag] flight_5.3 -> flight_5.3 2025-12-04T09:17:19.0959606Z * [new tag] forpull1 -> forpull1 2025-12-04T09:17:19.0961248Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-12-04T09:17:19.0962757Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-12-04T09:17:19.0964120Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-12-04T09:17:19.0965910Z * [new tag] nightly-binary -> nightly-binary 2025-12-04T09:17:19.0967116Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-12-04T09:17:19.0968623Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-12-04T09:17:19.0970502Z * [new tag] trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 2025-12-04T09:17:19.0972043Z * [new tag] trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e 2025-12-04T09:17:19.0973755Z * [new tag] trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 2025-12-04T09:17:19.0975436Z * [new tag] trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654 2025-12-04T09:17:19.0976758Z * [new tag] trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb 2025-12-04T09:17:19.0978336Z * [new tag] trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b 2025-12-04T09:17:19.0979877Z * [new tag] trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 2025-12-04T09:17:19.0981284Z * [new tag] trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75 2025-12-04T09:17:19.0982927Z * [new tag] trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 2025-12-04T09:17:19.0984512Z * [new tag] trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 2025-12-04T09:17:19.0985656Z * [new tag] trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688 2025-12-04T09:17:19.0987358Z * [new tag] trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10 2025-12-04T09:17:19.0988777Z * [new tag] trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec 2025-12-04T09:17:19.0990368Z * [new tag] trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f 2025-12-04T09:17:19.0992016Z * [new tag] trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 2025-12-04T09:17:19.0993507Z * [new tag] trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 2025-12-04T09:17:19.0995001Z * [new tag] trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 2025-12-04T09:17:19.0996459Z * [new tag] trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 2025-12-04T09:17:19.0997689Z * [new tag] trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e 2025-12-04T09:17:19.0999249Z * [new tag] trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520 2025-12-04T09:17:19.1000782Z * [new tag] trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 2025-12-04T09:17:19.1002311Z * [new tag] trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c 2025-12-04T09:17:19.1003373Z * [new tag] trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d 2025-12-04T09:17:19.1005101Z * [new tag] trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 2025-12-04T09:17:19.1006669Z * [new tag] trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de 2025-12-04T09:17:19.1008468Z * [new tag] trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 2025-12-04T09:17:19.1011376Z * [new tag] trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 2025-12-04T09:17:19.1012681Z * [new tag] trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f 2025-12-04T09:17:19.1014415Z * [new tag] trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9 2025-12-04T09:17:19.1015504Z * [new tag] trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87 2025-12-04T09:17:19.1017209Z * [new tag] trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc 2025-12-04T09:17:19.1018654Z * [new tag] trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25 2025-12-04T09:17:19.1020229Z * [new tag] trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 2025-12-04T09:17:19.1021800Z * [new tag] trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf 2025-12-04T09:17:19.1022867Z * [new tag] trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd 2025-12-04T09:17:19.1025176Z * [new tag] trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 2025-12-04T09:17:19.1026773Z * [new tag] trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 2025-12-04T09:17:19.1028098Z * [new tag] trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35 2025-12-04T09:17:19.1029628Z * [new tag] trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec 2025-12-04T09:17:19.1031222Z * [new tag] trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 2025-12-04T09:17:19.1032519Z * [new tag] trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1 2025-12-04T09:17:19.1034082Z * [new tag] trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 2025-12-04T09:17:19.1035401Z * [new tag] trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 2025-12-04T09:17:19.1037004Z * [new tag] trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 2025-12-04T09:17:19.1038315Z * [new tag] trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf 2025-12-04T09:17:19.1039901Z * [new tag] trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee 2025-12-04T09:17:19.1041448Z * [new tag] trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 2025-12-04T09:17:19.1042477Z * [new tag] trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 2025-12-04T09:17:19.1044125Z * [new tag] trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae 2025-12-04T09:17:19.1045603Z * [new tag] trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f 2025-12-04T09:17:19.1047073Z * [new tag] trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c 2025-12-04T09:17:19.1048510Z * [new tag] trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247 2025-12-04T09:17:19.1049939Z * [new tag] trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c 2025-12-04T09:17:19.1051236Z * [new tag] trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a 2025-12-04T09:17:19.1052739Z * [new tag] trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686 2025-12-04T09:17:19.1054464Z * [new tag] trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 2025-12-04T09:17:19.1055522Z * [new tag] trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af 2025-12-04T09:17:19.1057274Z * [new tag] trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 2025-12-04T09:17:19.1058764Z * [new tag] trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873 2025-12-04T09:17:19.1060299Z * [new tag] trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 2025-12-04T09:17:19.1061369Z * [new tag] trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f 2025-12-04T09:17:19.1063251Z * [new tag] trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa 2025-12-04T09:17:19.1064841Z * [new tag] trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c 2025-12-04T09:17:19.1066715Z * [new tag] trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a 2025-12-04T09:17:19.1068166Z * [new tag] trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d 2025-12-04T09:17:19.1069836Z * [new tag] trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 2025-12-04T09:17:19.1071299Z * [new tag] trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 2025-12-04T09:17:19.1072848Z * [new tag] trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a 2025-12-04T09:17:19.1074365Z * [new tag] trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 2025-12-04T09:17:19.1075888Z * [new tag] trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 2025-12-04T09:17:19.1077432Z * [new tag] trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 2025-12-04T09:17:19.1078905Z * [new tag] trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96 2025-12-04T09:17:19.1079975Z * [new tag] trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc 2025-12-04T09:17:19.1081911Z * [new tag] trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 2025-12-04T09:17:19.1083425Z * [new tag] trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd 2025-12-04T09:17:19.1084514Z * [new tag] trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 2025-12-04T09:17:19.1086406Z * [new tag] trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 2025-12-04T09:17:19.1087959Z * [new tag] trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c 2025-12-04T09:17:19.1089459Z * [new tag] trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b 2025-12-04T09:17:19.1090556Z * [new tag] trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c 2025-12-04T09:17:19.1092249Z * [new tag] trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 2025-12-04T09:17:19.1093824Z * [new tag] trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 2025-12-04T09:17:19.1095379Z * [new tag] trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef 2025-12-04T09:17:19.1096677Z * [new tag] trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719 2025-12-04T09:17:19.1098322Z * [new tag] trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b 2025-12-04T09:17:19.1099681Z * [new tag] trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a 2025-12-04T09:17:19.1101292Z * [new tag] trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206 2025-12-04T09:17:19.1102810Z * [new tag] trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 2025-12-04T09:17:19.1104282Z * [new tag] trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 2025-12-04T09:17:19.1105903Z * [new tag] trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d 2025-12-04T09:17:19.1106936Z * [new tag] trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b 2025-12-04T09:17:19.1108657Z * [new tag] trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 2025-12-04T09:17:19.1110272Z * [new tag] trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 2025-12-04T09:17:19.1111779Z * [new tag] trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec 2025-12-04T09:17:19.1113256Z * [new tag] trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 2025-12-04T09:17:19.1114756Z * [new tag] trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d 2025-12-04T09:17:19.1116279Z * [new tag] trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a 2025-12-04T09:17:19.1117897Z * [new tag] trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e 2025-12-04T09:17:19.1119356Z * [new tag] trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 2025-12-04T09:17:19.1121270Z * [new tag] trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485 2025-12-04T09:17:19.1122868Z * [new tag] trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734 2025-12-04T09:17:19.1125831Z * [new tag] trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f 2025-12-04T09:17:19.1126248Z * [new tag] trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 2025-12-04T09:17:19.1127751Z * [new tag] trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f -> trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f 2025-12-04T09:17:19.1128726Z * [new tag] trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03 2025-12-04T09:17:19.1130540Z * [new tag] trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 2025-12-04T09:17:19.1132015Z * [new tag] trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 2025-12-04T09:17:19.1133082Z * [new tag] trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 2025-12-04T09:17:19.1134800Z * [new tag] trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca 2025-12-04T09:17:19.1136292Z * [new tag] trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b 2025-12-04T09:17:19.1137757Z * [new tag] trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609 2025-12-04T09:17:19.1139349Z * [new tag] trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b 2025-12-04T09:17:19.1140459Z * [new tag] trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T09:17:19.1142094Z * [new tag] trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 2025-12-04T09:17:19.1143663Z * [new tag] trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed 2025-12-04T09:17:19.1145252Z * [new tag] trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 2025-12-04T09:17:19.1146354Z * [new tag] trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e 2025-12-04T09:17:19.1147988Z * [new tag] trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead 2025-12-04T09:17:19.1149193Z * [new tag] trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718 2025-12-04T09:17:19.1151160Z * [new tag] trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0 2025-12-04T09:17:19.1152960Z * [new tag] trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75 2025-12-04T09:17:19.1154249Z * [new tag] trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece 2025-12-04T09:17:19.1155750Z * [new tag] trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6 2025-12-04T09:17:19.1157423Z * [new tag] trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 2025-12-04T09:17:19.1158888Z * [new tag] trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c 2025-12-04T09:17:19.1160350Z * [new tag] trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 2025-12-04T09:17:19.1161888Z * [new tag] trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 2025-12-04T09:17:19.1163442Z * [new tag] trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca 2025-12-04T09:17:19.1164973Z * [new tag] trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38 2025-12-04T09:17:19.1166448Z * [new tag] trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 2025-12-04T09:17:19.1167577Z * [new tag] trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c 2025-12-04T09:17:19.1169398Z * [new tag] trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 2025-12-04T09:17:19.1170923Z * [new tag] trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 2025-12-04T09:17:19.1172451Z * [new tag] trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa 2025-12-04T09:17:19.1173923Z * [new tag] trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d 2025-12-04T09:17:19.1175434Z * [new tag] trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 2025-12-04T09:17:19.1176930Z * [new tag] trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 2025-12-04T09:17:19.1178434Z * [new tag] trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d 2025-12-04T09:17:19.1180019Z * [new tag] trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a 2025-12-04T09:17:19.1181581Z * [new tag] trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 2025-12-04T09:17:19.1183193Z * [new tag] trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 2025-12-04T09:17:19.1184642Z * [new tag] trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa 2025-12-04T09:17:19.1186367Z * [new tag] trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d 2025-12-04T09:17:19.1187682Z * [new tag] trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c 2025-12-04T09:17:19.1189327Z * [new tag] trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 2025-12-04T09:17:19.1190825Z * [new tag] trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c 2025-12-04T09:17:19.1191926Z * [new tag] trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45 2025-12-04T09:17:19.1193732Z * [new tag] trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 2025-12-04T09:17:19.1195266Z * [new tag] trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e 2025-12-04T09:17:19.1196721Z * [new tag] trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e 2025-12-04T09:17:19.1197881Z * [new tag] trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e 2025-12-04T09:17:19.1199501Z * [new tag] trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 2025-12-04T09:17:19.1201042Z * [new tag] trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 2025-12-04T09:17:19.1202601Z * [new tag] trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 2025-12-04T09:17:19.1204213Z * [new tag] trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99 2025-12-04T09:17:19.1205794Z * [new tag] trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 2025-12-04T09:17:19.1207344Z * [new tag] trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 2025-12-04T09:17:19.1209038Z * [new tag] trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a 2025-12-04T09:17:19.1210681Z * [new tag] trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 2025-12-04T09:17:19.1212167Z * [new tag] trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 2025-12-04T09:17:19.1213728Z * [new tag] trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b 2025-12-04T09:17:19.1215418Z * [new tag] trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e 2025-12-04T09:17:19.1216983Z * [new tag] trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf 2025-12-04T09:17:19.1218872Z * [new tag] trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 2025-12-04T09:17:19.1220699Z * [new tag] trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f 2025-12-04T09:17:19.1222376Z * [new tag] trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f 2025-12-04T09:17:19.1223853Z * [new tag] trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2 2025-12-04T09:17:19.1225357Z * [new tag] trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 2025-12-04T09:17:19.1227139Z * [new tag] trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 2025-12-04T09:17:19.1228243Z * [new tag] trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 2025-12-04T09:17:19.1230075Z * [new tag] trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 2025-12-04T09:17:19.1231845Z * [new tag] trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 2025-12-04T09:17:19.1233443Z * [new tag] trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede 2025-12-04T09:17:19.1235081Z * [new tag] trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac 2025-12-04T09:17:19.1236550Z * [new tag] trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb 2025-12-04T09:17:19.1237685Z * [new tag] trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 2025-12-04T09:17:19.1239597Z * [new tag] trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 2025-12-04T09:17:19.1241273Z * [new tag] trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09 2025-12-04T09:17:19.1242400Z * [new tag] trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 2025-12-04T09:17:19.1244180Z * [new tag] trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a 2025-12-04T09:17:19.1245800Z * [new tag] trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace 2025-12-04T09:17:19.1247231Z * [new tag] trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39 2025-12-04T09:17:19.1248726Z * [new tag] trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 2025-12-04T09:17:19.1250258Z * [new tag] trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 2025-12-04T09:17:19.1251828Z * [new tag] trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf 2025-12-04T09:17:19.1253358Z * [new tag] trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 2025-12-04T09:17:19.1254808Z * [new tag] trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d 2025-12-04T09:17:19.1256383Z * [new tag] trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 2025-12-04T09:17:19.1257926Z * [new tag] trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 2025-12-04T09:17:19.1259472Z * [new tag] trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e 2025-12-04T09:17:19.1261105Z * [new tag] trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a 2025-12-04T09:17:19.1262594Z * [new tag] trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b 2025-12-04T09:17:19.1264204Z * [new tag] trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec 2025-12-04T09:17:19.1265841Z * [new tag] trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf 2025-12-04T09:17:19.1267391Z * [new tag] trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd 2025-12-04T09:17:19.1268854Z * [new tag] trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889 2025-12-04T09:17:19.1270411Z * [new tag] trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77 2025-12-04T09:17:19.1272207Z * [new tag] trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c 2025-12-04T09:17:19.1273792Z * [new tag] trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b 2025-12-04T09:17:19.1275093Z * [new tag] trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235 2025-12-04T09:17:19.1276796Z * [new tag] trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e 2025-12-04T09:17:19.1278450Z * [new tag] trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc 2025-12-04T09:17:19.1280000Z * [new tag] trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 2025-12-04T09:17:19.1281563Z * [new tag] trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf 2025-12-04T09:17:19.1283083Z * [new tag] trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e 2025-12-04T09:17:19.1284598Z * [new tag] trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e 2025-12-04T09:17:19.1286508Z * [new tag] trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 2025-12-04T09:17:19.1288034Z * [new tag] trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 2025-12-04T09:17:19.1289602Z * [new tag] trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 2025-12-04T09:17:19.1291054Z * [new tag] trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598 2025-12-04T09:17:19.1292568Z * [new tag] trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 2025-12-04T09:17:19.1294110Z * [new tag] trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 2025-12-04T09:17:19.1295626Z * [new tag] trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 2025-12-04T09:17:19.1297111Z * [new tag] trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 2025-12-04T09:17:19.1298636Z * [new tag] trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 2025-12-04T09:17:19.1300423Z * [new tag] trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 2025-12-04T09:17:19.1301527Z * [new tag] trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 2025-12-04T09:17:19.1303324Z * [new tag] trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b 2025-12-04T09:17:19.1304900Z * [new tag] trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 2025-12-04T09:17:19.1306844Z * [new tag] trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 2025-12-04T09:17:19.1308715Z * [new tag] trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 2025-12-04T09:17:19.1310323Z * [new tag] trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:17:19.1311049Z * [new tag] v0.1.1 -> v0.1.1 2025-12-04T09:17:19.1312682Z * [new tag] v0.1.10 -> v0.1.10 2025-12-04T09:17:19.1314057Z * [new tag] v0.1.11 -> v0.1.11 2025-12-04T09:17:19.1315638Z * [new tag] v0.1.12 -> v0.1.12 2025-12-04T09:17:19.1316972Z * [new tag] v0.1.2 -> v0.1.2 2025-12-04T09:17:19.1318349Z * [new tag] v0.1.3 -> v0.1.3 2025-12-04T09:17:19.1319731Z * [new tag] v0.1.4 -> v0.1.4 2025-12-04T09:17:19.1321152Z * [new tag] v0.1.5 -> v0.1.5 2025-12-04T09:17:19.1322556Z * [new tag] v0.1.6 -> v0.1.6 2025-12-04T09:17:19.1323886Z * [new tag] v0.1.7 -> v0.1.7 2025-12-04T09:17:19.1325323Z * [new tag] v0.1.8 -> v0.1.8 2025-12-04T09:17:19.1326681Z * [new tag] v0.1.9 -> v0.1.9 2025-12-04T09:17:19.1328170Z * [new tag] v0.2.0 -> v0.2.0 2025-12-04T09:17:19.1329599Z * [new tag] v0.3.0 -> v0.3.0 2025-12-04T09:17:19.1331088Z * [new tag] v0.3.1 -> v0.3.1 2025-12-04T09:17:19.1332676Z * [new tag] v0.4.0 -> v0.4.0 2025-12-04T09:17:19.1334149Z * [new tag] v0.4.1 -> v0.4.1 2025-12-04T09:17:19.1335566Z * [new tag] v1.0.0 -> v1.0.0 2025-12-04T09:17:19.1336957Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-12-04T09:17:19.1338373Z * [new tag] v1.0.1 -> v1.0.1 2025-12-04T09:17:19.1339950Z * [new tag] v1.0rc0 -> v1.0rc0 2025-12-04T09:17:19.1341143Z * [new tag] v1.0rc1 -> v1.0rc1 2025-12-04T09:17:19.1342570Z * [new tag] v1.1.0 -> v1.1.0 2025-12-04T09:17:19.1344446Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-12-04T09:17:19.1346087Z * [new tag] v1.10.0 -> v1.10.0 2025-12-04T09:17:19.1347616Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-12-04T09:17:19.1349036Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-12-04T09:17:19.1350248Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-12-04T09:17:19.1351743Z * [new tag] v1.10.1 -> v1.10.1 2025-12-04T09:17:19.1352950Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-12-04T09:17:19.1354168Z * [new tag] v1.10.2 -> v1.10.2 2025-12-04T09:17:19.1355381Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-12-04T09:17:19.1356854Z * [new tag] v1.11.0 -> v1.11.0 2025-12-04T09:17:19.1358271Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-12-04T09:17:19.1359882Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-12-04T09:17:19.1361463Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-12-04T09:17:19.1362954Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-12-04T09:17:19.1364404Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-12-04T09:17:19.1365648Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-12-04T09:17:19.1366860Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-12-04T09:17:19.1368481Z * [new tag] v1.12.0 -> v1.12.0 2025-12-04T09:17:19.1369840Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-12-04T09:17:19.1371353Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-12-04T09:17:19.1372899Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-12-04T09:17:19.1374374Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-12-04T09:17:19.1375808Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-12-04T09:17:19.1377388Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-12-04T09:17:19.1378615Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-12-04T09:17:19.1379975Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-12-04T09:17:19.1381168Z * [new tag] v1.12.1 -> v1.12.1 2025-12-04T09:17:19.1382732Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-12-04T09:17:19.1384447Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-12-04T09:17:19.1385969Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-12-04T09:17:19.1387426Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-12-04T09:17:19.1388645Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-12-04T09:17:19.1390129Z * [new tag] v1.13.0 -> v1.13.0 2025-12-04T09:17:19.1391543Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-12-04T09:17:19.1392914Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-12-04T09:17:19.1394273Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-12-04T09:17:19.1395878Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-12-04T09:17:19.1397131Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-12-04T09:17:19.1398347Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-12-04T09:17:19.1399835Z * [new tag] v1.13.1 -> v1.13.1 2025-12-04T09:17:19.1401071Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-12-04T09:17:19.1402513Z * [new tag] v1.2.0 -> v1.2.0 2025-12-04T09:17:19.1403945Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-12-04T09:17:19.1405344Z * [new tag] v1.3.0 -> v1.3.0 2025-12-04T09:17:19.1406873Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-12-04T09:17:19.1408249Z * [new tag] v1.3.1 -> v1.3.1 2025-12-04T09:17:19.1413176Z * [new tag] v1.4.0 -> v1.4.0 2025-12-04T09:17:19.1414608Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-12-04T09:17:19.1415827Z * [new tag] v1.4.1 -> v1.4.1 2025-12-04T09:17:19.1417335Z * [new tag] v1.5.0 -> v1.5.0 2025-12-04T09:17:19.1418816Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-12-04T09:17:19.1420462Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-12-04T09:17:19.1421972Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-12-04T09:17:19.1423311Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-12-04T09:17:19.1424512Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-12-04T09:17:19.1426039Z * [new tag] v1.5.1 -> v1.5.1 2025-12-04T09:17:19.1427291Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-12-04T09:17:19.1428464Z * [new tag] v1.6.0 -> v1.6.0 2025-12-04T09:17:19.1429958Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-12-04T09:17:19.1431591Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-12-04T09:17:19.1432947Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-12-04T09:17:19.1434337Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-12-04T09:17:19.1435920Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-12-04T09:17:19.1437302Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-12-04T09:17:19.1439039Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-12-04T09:17:19.1440555Z * [new tag] v1.7.0 -> v1.7.0 2025-12-04T09:17:19.1442031Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-12-04T09:17:19.1443554Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-12-04T09:17:19.1445029Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-12-04T09:17:19.1446257Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-12-04T09:17:19.1447724Z * [new tag] v1.7.1 -> v1.7.1 2025-12-04T09:17:19.1449292Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-12-04T09:17:19.1450748Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-12-04T09:17:19.1451987Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-12-04T09:17:19.1453482Z * [new tag] v1.8.0 -> v1.8.0 2025-12-04T09:17:19.1454720Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-12-04T09:17:19.1456283Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-12-04T09:17:19.1457727Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-12-04T09:17:19.1459176Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-12-04T09:17:19.1460434Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-12-04T09:17:19.1461671Z * [new tag] v1.8.1 -> v1.8.1 2025-12-04T09:17:19.1463168Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-12-04T09:17:19.1464391Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-12-04T09:17:19.1465653Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-12-04T09:17:19.1467550Z * [new tag] v1.8.2 -> v1.8.2 2025-12-04T09:17:19.1468784Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-12-04T09:17:19.1470277Z * [new tag] v1.9.0 -> v1.9.0 2025-12-04T09:17:19.1471768Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-12-04T09:17:19.1473259Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-12-04T09:17:19.1474770Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-12-04T09:17:19.1476026Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-12-04T09:17:19.1477479Z * [new tag] v1.9.1 -> v1.9.1 2025-12-04T09:17:19.1479101Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-12-04T09:17:19.1480325Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-12-04T09:17:19.1481885Z * [new tag] v2.0.0 -> v2.0.0 2025-12-04T09:17:19.1483239Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-12-04T09:17:19.1484751Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-12-04T09:17:19.1486223Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-12-04T09:17:19.1487692Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-12-04T09:17:19.1489184Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-12-04T09:17:19.1490530Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-12-04T09:17:19.1491963Z * [new tag] v2.0.1 -> v2.0.1 2025-12-04T09:17:19.1493420Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-12-04T09:17:19.1494410Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-12-04T09:17:19.1496130Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-12-04T09:17:19.1497387Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-12-04T09:17:19.1499361Z * [new tag] v2.1.0 -> v2.1.0 2025-12-04T09:17:19.1500849Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-12-04T09:17:19.1502448Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-12-04T09:17:19.1504043Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-12-04T09:17:19.1505510Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-12-04T09:17:19.1506963Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-12-04T09:17:19.1508181Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-12-04T09:17:19.1510052Z * [new tag] v2.1.1 -> v2.1.1 2025-12-04T09:17:19.1511547Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-12-04T09:17:19.1513073Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-12-04T09:17:19.1514643Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-12-04T09:17:19.1516223Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-12-04T09:17:19.1517600Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-12-04T09:17:19.1518816Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-12-04T09:17:19.1520250Z * [new tag] v2.1.2 -> v2.1.2 2025-12-04T09:17:19.1521877Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-12-04T09:17:19.1523275Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-12-04T09:17:19.1524527Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-12-04T09:17:19.1526088Z * [new tag] v2.2.0 -> v2.2.0 2025-12-04T09:17:19.1527562Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-12-04T09:17:19.1528941Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-12-04T09:17:19.1530320Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-12-04T09:17:19.1532292Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-12-04T09:17:19.1533759Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-12-04T09:17:19.1535193Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-12-04T09:17:19.1536450Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-12-04T09:17:19.1537673Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-12-04T09:17:19.1539341Z * [new tag] v2.2.1 -> v2.2.1 2025-12-04T09:17:19.1540834Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-12-04T09:17:19.1542098Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-12-04T09:17:19.1543317Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-12-04T09:17:19.1544585Z * [new tag] v2.2.2 -> v2.2.2 2025-12-04T09:17:19.1546163Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-12-04T09:17:19.1547446Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-12-04T09:17:19.1548743Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-12-04T09:17:19.1550468Z * [new tag] v2.3.0 -> v2.3.0 2025-12-04T09:17:19.1551720Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-12-04T09:17:19.1553239Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-12-04T09:17:19.1554831Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-12-04T09:17:19.1555865Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-12-04T09:17:19.1557588Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-12-04T09:17:19.1559095Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-12-04T09:17:19.1560520Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-12-04T09:17:19.1561996Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-12-04T09:17:19.1563241Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-12-04T09:17:19.1564772Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-12-04T09:17:19.1566234Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-12-04T09:17:19.1567501Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-12-04T09:17:19.1568709Z * [new tag] v2.3.1 -> v2.3.1 2025-12-04T09:17:19.1570232Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-12-04T09:17:19.1571730Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-12-04T09:17:19.1573278Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-12-04T09:17:19.1574696Z * [new tag] v2.4.0 -> v2.4.0 2025-12-04T09:17:19.1576212Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-12-04T09:17:19.1577631Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-12-04T09:17:19.1579253Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-12-04T09:17:19.1580643Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-12-04T09:17:19.1582137Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-12-04T09:17:19.1583837Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-12-04T09:17:19.1585446Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-12-04T09:17:19.1586910Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-12-04T09:17:19.1588410Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-12-04T09:17:19.1589667Z * [new tag] v2.4.1 -> v2.4.1 2025-12-04T09:17:19.1591237Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-12-04T09:17:19.1592700Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-12-04T09:17:19.1594186Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-12-04T09:17:19.1595778Z * [new tag] v2.5.0 -> v2.5.0 2025-12-04T09:17:19.1597241Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-12-04T09:17:19.1598441Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-12-04T09:17:19.1599847Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-12-04T09:17:19.1601316Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-12-04T09:17:19.1602839Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-12-04T09:17:19.1604254Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-12-04T09:17:19.1605785Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-12-04T09:17:19.1607238Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-12-04T09:17:19.1608849Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-12-04T09:17:19.1610607Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-12-04T09:17:19.1611465Z * [new tag] v2.5.1 -> v2.5.1 2025-12-04T09:17:19.1612945Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-12-04T09:17:19.1614205Z * [new tag] v2.6.0 -> v2.6.0 2025-12-04T09:17:19.1615720Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-12-04T09:17:19.1617207Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-12-04T09:17:19.1618673Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-12-04T09:17:19.1620333Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-12-04T09:17:19.1621975Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-12-04T09:17:19.1623538Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-12-04T09:17:19.1625513Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-12-04T09:17:19.1627132Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-12-04T09:17:19.1628584Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-12-04T09:17:19.1630235Z * [new tag] v2.7.0 -> v2.7.0 2025-12-04T09:17:19.1631684Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-12-04T09:17:19.1632971Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-12-04T09:17:19.1634536Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-12-04T09:17:19.1636013Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-12-04T09:17:19.1637474Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-12-04T09:17:19.1638988Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-12-04T09:17:19.1640390Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-12-04T09:17:19.1641863Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-12-04T09:17:19.1643508Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-12-04T09:17:19.1645084Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-12-04T09:17:19.1646349Z * [new tag] v2.7.1 -> v2.7.1 2025-12-04T09:17:19.1647921Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-12-04T09:17:19.1649444Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-12-04T09:17:19.1651115Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-12-04T09:17:19.1652654Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-12-04T09:17:19.1654144Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-12-04T09:17:19.1655419Z * [new tag] v2.8.0 -> v2.8.0 2025-12-04T09:17:19.1657004Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-12-04T09:17:19.1658420Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-12-04T09:17:19.1660106Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-12-04T09:17:19.1661715Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-12-04T09:17:19.1663238Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-12-04T09:17:19.1664758Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-12-04T09:17:19.1666309Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-12-04T09:17:19.1667772Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-12-04T09:17:19.1669363Z * [new tag] v2.9.0 -> v2.9.0 2025-12-04T09:17:19.1670845Z * [new tag] v2.9.0-rc1 -> v2.9.0-rc1 2025-12-04T09:17:19.1672438Z * [new tag] v2.9.0-rc10 -> v2.9.0-rc10 2025-12-04T09:17:19.1673893Z * [new tag] v2.9.0-rc11 -> v2.9.0-rc11 2025-12-04T09:17:19.1675631Z * [new tag] v2.9.0-rc2 -> v2.9.0-rc2 2025-12-04T09:17:19.1677174Z * [new tag] v2.9.0-rc3 -> v2.9.0-rc3 2025-12-04T09:17:19.1678692Z * [new tag] v2.9.0-rc4 -> v2.9.0-rc4 2025-12-04T09:17:19.1680211Z * [new tag] v2.9.0-rc5 -> v2.9.0-rc5 2025-12-04T09:17:19.1681974Z * [new tag] v2.9.0-rc6 -> v2.9.0-rc6 2025-12-04T09:17:19.1683460Z * [new tag] v2.9.0-rc7 -> v2.9.0-rc7 2025-12-04T09:17:19.1685162Z * [new tag] v2.9.0-rc8 -> v2.9.0-rc8 2025-12-04T09:17:19.1686439Z * [new tag] v2.9.0-rc9 -> v2.9.0-rc9 2025-12-04T09:17:19.1687744Z * [new tag] v2.9.1 -> v2.9.1 2025-12-04T09:17:19.1689221Z * [new tag] v2.9.1-rc1 -> v2.9.1-rc1 2025-12-04T09:17:19.1690780Z * [new tag] v2.9.1-rc2 -> v2.9.1-rc2 2025-12-04T09:17:19.1693396Z * [new tag] viable/strict/1759343184 -> viable/strict/1759343184 2025-12-04T09:17:19.1694842Z * [new tag] viable/strict/1759346540 -> viable/strict/1759346540 2025-12-04T09:17:19.1696190Z * [new tag] viable/strict/1759348181 -> viable/strict/1759348181 2025-12-04T09:17:19.1697606Z * [new tag] viable/strict/1759350324 -> viable/strict/1759350324 2025-12-04T09:17:19.1699099Z * [new tag] viable/strict/1759351793 -> viable/strict/1759351793 2025-12-04T09:17:19.1700621Z * [new tag] viable/strict/1759353844 -> viable/strict/1759353844 2025-12-04T09:17:19.1701964Z * [new tag] viable/strict/1759355374 -> viable/strict/1759355374 2025-12-04T09:17:19.1703352Z * [new tag] viable/strict/1759357472 -> viable/strict/1759357472 2025-12-04T09:17:19.1704728Z * [new tag] viable/strict/1759361002 -> viable/strict/1759361002 2025-12-04T09:17:19.1706569Z * [new tag] viable/strict/1759362585 -> viable/strict/1759362585 2025-12-04T09:17:19.1708240Z * [new tag] viable/strict/1759365359 -> viable/strict/1759365359 2025-12-04T09:17:19.1709991Z * [new tag] viable/strict/1759370089 -> viable/strict/1759370089 2025-12-04T09:17:19.1711478Z * [new tag] viable/strict/1759377554 -> viable/strict/1759377554 2025-12-04T09:17:19.1713001Z * [new tag] viable/strict/1759379133 -> viable/strict/1759379133 2025-12-04T09:17:19.1714446Z * [new tag] viable/strict/1759389871 -> viable/strict/1759389871 2025-12-04T09:17:19.1716023Z * [new tag] viable/strict/1759393562 -> viable/strict/1759393562 2025-12-04T09:17:19.1717512Z * [new tag] viable/strict/1759395076 -> viable/strict/1759395076 2025-12-04T09:17:19.1719015Z * [new tag] viable/strict/1759398579 -> viable/strict/1759398579 2025-12-04T09:17:19.1720521Z * [new tag] viable/strict/1759404142 -> viable/strict/1759404142 2025-12-04T09:17:19.1721961Z * [new tag] viable/strict/1759405773 -> viable/strict/1759405773 2025-12-04T09:17:19.1723472Z * [new tag] viable/strict/1759408041 -> viable/strict/1759408041 2025-12-04T09:17:19.1724948Z * [new tag] viable/strict/1759411593 -> viable/strict/1759411593 2025-12-04T09:17:19.1726372Z * [new tag] viable/strict/1759427395 -> viable/strict/1759427395 2025-12-04T09:17:19.1727832Z * [new tag] viable/strict/1759434582 -> viable/strict/1759434582 2025-12-04T09:17:19.1729353Z * [new tag] viable/strict/1759436720 -> viable/strict/1759436720 2025-12-04T09:17:19.1730966Z * [new tag] viable/strict/1759440219 -> viable/strict/1759440219 2025-12-04T09:17:19.1732368Z * [new tag] viable/strict/1759441948 -> viable/strict/1759441948 2025-12-04T09:17:19.1733836Z * [new tag] viable/strict/1759443860 -> viable/strict/1759443860 2025-12-04T09:17:19.1735365Z * [new tag] viable/strict/1759445377 -> viable/strict/1759445377 2025-12-04T09:17:19.1736866Z * [new tag] viable/strict/1759447415 -> viable/strict/1759447415 2025-12-04T09:17:19.1744420Z * [new tag] viable/strict/1759451750 -> viable/strict/1759451750 2025-12-04T09:17:19.1744769Z * [new tag] viable/strict/1759453910 -> viable/strict/1759453910 2025-12-04T09:17:19.1744962Z * [new tag] viable/strict/1759456483 -> viable/strict/1759456483 2025-12-04T09:17:19.1745147Z * [new tag] viable/strict/1759459279 -> viable/strict/1759459279 2025-12-04T09:17:19.1745335Z * [new tag] viable/strict/1759460742 -> viable/strict/1759460742 2025-12-04T09:17:19.1745771Z * [new tag] viable/strict/1759462025 -> viable/strict/1759462025 2025-12-04T09:17:19.1747880Z * [new tag] viable/strict/1759469086 -> viable/strict/1759469086 2025-12-04T09:17:19.1748826Z * [new tag] viable/strict/1759470581 -> viable/strict/1759470581 2025-12-04T09:17:19.1750508Z * [new tag] viable/strict/1759472786 -> viable/strict/1759472786 2025-12-04T09:17:19.1751993Z * [new tag] viable/strict/1759476294 -> viable/strict/1759476294 2025-12-04T09:17:19.1753480Z * [new tag] viable/strict/1759479963 -> viable/strict/1759479963 2025-12-04T09:17:19.1754939Z * [new tag] viable/strict/1759492177 -> viable/strict/1759492177 2025-12-04T09:17:19.1756377Z * [new tag] viable/strict/1759519278 -> viable/strict/1759519278 2025-12-04T09:17:19.1757846Z * [new tag] viable/strict/1759524580 -> viable/strict/1759524580 2025-12-04T09:17:19.1759270Z * [new tag] viable/strict/1759528193 -> viable/strict/1759528193 2025-12-04T09:17:19.1760946Z * [new tag] viable/strict/1759533797 -> viable/strict/1759533797 2025-12-04T09:17:19.1762463Z * [new tag] viable/strict/1759542780 -> viable/strict/1759542780 2025-12-04T09:17:19.1763954Z * [new tag] viable/strict/1759549779 -> viable/strict/1759549779 2025-12-04T09:17:19.1765445Z * [new tag] viable/strict/1759555455 -> viable/strict/1759555455 2025-12-04T09:17:19.1766921Z * [new tag] viable/strict/1759559176 -> viable/strict/1759559176 2025-12-04T09:17:19.1768406Z * [new tag] viable/strict/1759560629 -> viable/strict/1759560629 2025-12-04T09:17:19.1769867Z * [new tag] viable/strict/1759569848 -> viable/strict/1759569848 2025-12-04T09:17:19.1771599Z * [new tag] viable/strict/1759571382 -> viable/strict/1759571382 2025-12-04T09:17:19.1773001Z * [new tag] viable/strict/1759573474 -> viable/strict/1759573474 2025-12-04T09:17:19.1774460Z * [new tag] viable/strict/1759618187 -> viable/strict/1759618187 2025-12-04T09:17:19.1775976Z * [new tag] viable/strict/1759626742 -> viable/strict/1759626742 2025-12-04T09:17:19.1777536Z * [new tag] viable/strict/1759632427 -> viable/strict/1759632427 2025-12-04T09:17:19.1779051Z * [new tag] viable/strict/1759634971 -> viable/strict/1759634971 2025-12-04T09:17:19.1780693Z * [new tag] viable/strict/1759661382 -> viable/strict/1759661382 2025-12-04T09:17:19.1782236Z * [new tag] viable/strict/1759663294 -> viable/strict/1759663294 2025-12-04T09:17:19.1783539Z * [new tag] viable/strict/1759708178 -> viable/strict/1759708178 2025-12-04T09:17:19.1785131Z * [new tag] viable/strict/1759715695 -> viable/strict/1759715695 2025-12-04T09:17:19.1786658Z * [new tag] viable/strict/1759728293 -> viable/strict/1759728293 2025-12-04T09:17:19.1788660Z * [new tag] viable/strict/1759735513 -> viable/strict/1759735513 2025-12-04T09:17:19.1790278Z * [new tag] viable/strict/1759739177 -> viable/strict/1759739177 2025-12-04T09:17:19.1791718Z * [new tag] viable/strict/1759758635 -> viable/strict/1759758635 2025-12-04T09:17:19.1793205Z * [new tag] viable/strict/1759765784 -> viable/strict/1759765784 2025-12-04T09:17:19.1794693Z * [new tag] viable/strict/1759767948 -> viable/strict/1759767948 2025-12-04T09:17:19.1796230Z * [new tag] viable/strict/1759771461 -> viable/strict/1759771461 2025-12-04T09:17:19.1797565Z * [new tag] viable/strict/1759776706 -> viable/strict/1759776706 2025-12-04T09:17:19.1799137Z * [new tag] viable/strict/1759782317 -> viable/strict/1759782317 2025-12-04T09:17:19.1800688Z * [new tag] viable/strict/1759783777 -> viable/strict/1759783777 2025-12-04T09:17:19.1802258Z * [new tag] viable/strict/1759785815 -> viable/strict/1759785815 2025-12-04T09:17:19.1803708Z * [new tag] viable/strict/1759789459 -> viable/strict/1759789459 2025-12-04T09:17:19.1805248Z * [new tag] viable/strict/1759790974 -> viable/strict/1759790974 2025-12-04T09:17:19.1806600Z * [new tag] viable/strict/1759794583 -> viable/strict/1759794583 2025-12-04T09:17:19.1808132Z * [new tag] viable/strict/1759797408 -> viable/strict/1759797408 2025-12-04T09:17:19.1811989Z * [new tag] viable/strict/1759799518 -> viable/strict/1759799518 2025-12-04T09:17:19.1813463Z * [new tag] viable/strict/1759804909 -> viable/strict/1759804909 2025-12-04T09:17:19.1814965Z * [new tag] viable/strict/1759807643 -> viable/strict/1759807643 2025-12-04T09:17:19.1816461Z * [new tag] viable/strict/1759809089 -> viable/strict/1759809089 2025-12-04T09:17:19.1817929Z * [new tag] viable/strict/1759811145 -> viable/strict/1759811145 2025-12-04T09:17:19.1819593Z * [new tag] viable/strict/1759812581 -> viable/strict/1759812581 2025-12-04T09:17:19.1821126Z * [new tag] viable/strict/1759814683 -> viable/strict/1759814683 2025-12-04T09:17:19.1822615Z * [new tag] viable/strict/1759821889 -> viable/strict/1759821889 2025-12-04T09:17:19.1824120Z * [new tag] viable/strict/1759823376 -> viable/strict/1759823376 2025-12-04T09:17:19.1825595Z * [new tag] viable/strict/1759827107 -> viable/strict/1759827107 2025-12-04T09:17:19.1827064Z * [new tag] viable/strict/1759830577 -> viable/strict/1759830577 2025-12-04T09:17:19.1828687Z * [new tag] viable/strict/1759832720 -> viable/strict/1759832720 2025-12-04T09:17:19.1830160Z * [new tag] viable/strict/1759842063 -> viable/strict/1759842063 2025-12-04T09:17:19.1831622Z * [new tag] viable/strict/1759847121 -> viable/strict/1759847121 2025-12-04T09:17:19.1833450Z * [new tag] viable/strict/1759850721 -> viable/strict/1759850721 2025-12-04T09:17:19.1835123Z * [new tag] viable/strict/1759857870 -> viable/strict/1759857870 2025-12-04T09:17:19.1836634Z * [new tag] viable/strict/1759863143 -> viable/strict/1759863143 2025-12-04T09:17:19.1838183Z * [new tag] viable/strict/1759875874 -> viable/strict/1759875874 2025-12-04T09:17:19.1839807Z * [new tag] viable/strict/1759877385 -> viable/strict/1759877385 2025-12-04T09:17:19.1841300Z * [new tag] viable/strict/1759883801 -> viable/strict/1759883801 2025-12-04T09:17:19.1842958Z * [new tag] viable/strict/1759885922 -> viable/strict/1759885922 2025-12-04T09:17:19.1844323Z * [new tag] viable/strict/1759888488 -> viable/strict/1759888488 2025-12-04T09:17:19.1845808Z * [new tag] viable/strict/1759895471 -> viable/strict/1759895471 2025-12-04T09:17:19.1847329Z * [new tag] viable/strict/1759904803 -> viable/strict/1759904803 2025-12-04T09:17:19.1848995Z * [new tag] viable/strict/1759908300 -> viable/strict/1759908300 2025-12-04T09:17:19.1850536Z * [new tag] viable/strict/1759915520 -> viable/strict/1759915520 2025-12-04T09:17:19.1852061Z * [new tag] viable/strict/1759916978 -> viable/strict/1759916978 2025-12-04T09:17:19.1853396Z * [new tag] viable/strict/1759930024 -> viable/strict/1759930024 2025-12-04T09:17:19.1854906Z * [new tag] viable/strict/1759948122 -> viable/strict/1759948122 2025-12-04T09:17:19.1856411Z * [new tag] viable/strict/1759952983 -> viable/strict/1759952983 2025-12-04T09:17:19.1857933Z * [new tag] viable/strict/1759955121 -> viable/strict/1759955121 2025-12-04T09:17:19.1859508Z * [new tag] viable/strict/1759962298 -> viable/strict/1759962298 2025-12-04T09:17:19.1861046Z * [new tag] viable/strict/1759965837 -> viable/strict/1759965837 2025-12-04T09:17:19.1862602Z * [new tag] viable/strict/1759970213 -> viable/strict/1759970213 2025-12-04T09:17:19.1864113Z * [new tag] viable/strict/1759974894 -> viable/strict/1759974894 2025-12-04T09:17:19.1865583Z * [new tag] viable/strict/1759977763 -> viable/strict/1759977763 2025-12-04T09:17:19.1867099Z * [new tag] viable/strict/1759979241 -> viable/strict/1759979241 2025-12-04T09:17:19.1868644Z * [new tag] viable/strict/1759985417 -> viable/strict/1759985417 2025-12-04T09:17:19.1870144Z * [new tag] viable/strict/1759987490 -> viable/strict/1759987490 2025-12-04T09:17:19.1871635Z * [new tag] viable/strict/1759996180 -> viable/strict/1759996180 2025-12-04T09:17:19.1873108Z * [new tag] viable/strict/1760065682 -> viable/strict/1760065682 2025-12-04T09:17:19.1874610Z * [new tag] viable/strict/1760066894 -> viable/strict/1760066894 2025-12-04T09:17:19.1876203Z * [new tag] viable/strict/1760070345 -> viable/strict/1760070345 2025-12-04T09:17:19.1877701Z * [new tag] viable/strict/1760089782 -> viable/strict/1760089782 2025-12-04T09:17:19.1879189Z * [new tag] viable/strict/1760091921 -> viable/strict/1760091921 2025-12-04T09:17:19.1880655Z * [new tag] viable/strict/1760127924 -> viable/strict/1760127924 2025-12-04T09:17:19.1882156Z * [new tag] viable/strict/1760129489 -> viable/strict/1760129489 2025-12-04T09:17:19.1883705Z * [new tag] viable/strict/1760132980 -> viable/strict/1760132980 2025-12-04T09:17:19.1885339Z * [new tag] viable/strict/1760135060 -> viable/strict/1760135060 2025-12-04T09:17:19.1886920Z * [new tag] viable/strict/1760215782 -> viable/strict/1760215782 2025-12-04T09:17:19.1888925Z * [new tag] viable/strict/1760273849 -> viable/strict/1760273849 2025-12-04T09:17:19.1890420Z * [new tag] viable/strict/1760275517 -> viable/strict/1760275517 2025-12-04T09:17:19.1891896Z * [new tag] viable/strict/1760276979 -> viable/strict/1760276979 2025-12-04T09:17:19.1893452Z * [new tag] viable/strict/1760279007 -> viable/strict/1760279007 2025-12-04T09:17:19.1894809Z * [new tag] viable/strict/1760286328 -> viable/strict/1760286328 2025-12-04T09:17:19.1896152Z * [new tag] viable/strict/1760493304 -> viable/strict/1760493304 2025-12-04T09:17:19.1897746Z * [new tag] viable/strict/1760496298 -> viable/strict/1760496298 2025-12-04T09:17:19.1899197Z * [new tag] viable/strict/1760518396 -> viable/strict/1760518396 2025-12-04T09:17:19.1900723Z * [new tag] viable/strict/1760534864 -> viable/strict/1760534864 2025-12-04T09:17:19.1902238Z * [new tag] viable/strict/1760549062 -> viable/strict/1760549062 2025-12-04T09:17:19.1903817Z * [new tag] viable/strict/1760552799 -> viable/strict/1760552799 2025-12-04T09:17:19.1905378Z * [new tag] viable/strict/1760554355 -> viable/strict/1760554355 2025-12-04T09:17:19.1906936Z * [new tag] viable/strict/1760556275 -> viable/strict/1760556275 2025-12-04T09:17:19.1908427Z * [new tag] viable/strict/1760564979 -> viable/strict/1760564979 2025-12-04T09:17:19.1910179Z * [new tag] viable/strict/1760567049 -> viable/strict/1760567049 2025-12-04T09:17:19.1912074Z * [new tag] viable/strict/1760568585 -> viable/strict/1760568585 2025-12-04T09:17:19.1913546Z * [new tag] viable/strict/1760570630 -> viable/strict/1760570630 2025-12-04T09:17:19.1915021Z * [new tag] viable/strict/1760572180 -> viable/strict/1760572180 2025-12-04T09:17:19.1916564Z * [new tag] viable/strict/1760575094 -> viable/strict/1760575094 2025-12-04T09:17:19.1918150Z * [new tag] viable/strict/1760579709 -> viable/strict/1760579709 2025-12-04T09:17:19.1920170Z * [new tag] viable/strict/1760582614 -> viable/strict/1760582614 2025-12-04T09:17:19.1921626Z * [new tag] viable/strict/1760586815 -> viable/strict/1760586815 2025-12-04T09:17:19.1922992Z * [new tag] viable/strict/1760588829 -> viable/strict/1760588829 2025-12-04T09:17:19.1924531Z * [new tag] viable/strict/1760590200 -> viable/strict/1760590200 2025-12-04T09:17:19.1926092Z * [new tag] viable/strict/1760592311 -> viable/strict/1760592311 2025-12-04T09:17:19.1927556Z * [new tag] viable/strict/1760619733 -> viable/strict/1760619733 2025-12-04T09:17:19.1928886Z * [new tag] viable/strict/1760628335 -> viable/strict/1760628335 2025-12-04T09:17:19.1930381Z * [new tag] viable/strict/1760635490 -> viable/strict/1760635490 2025-12-04T09:17:19.1931838Z * [new tag] viable/strict/1760640743 -> viable/strict/1760640743 2025-12-04T09:17:19.1933381Z * [new tag] viable/strict/1760642528 -> viable/strict/1760642528 2025-12-04T09:17:19.1934851Z * [new tag] viable/strict/1760646330 -> viable/strict/1760646330 2025-12-04T09:17:19.1936517Z * [new tag] viable/strict/1760666101 -> viable/strict/1760666101 2025-12-04T09:17:19.1937959Z * [new tag] viable/strict/1760668990 -> viable/strict/1760668990 2025-12-04T09:17:19.1939527Z * [new tag] viable/strict/1760670600 -> viable/strict/1760670600 2025-12-04T09:17:19.1941120Z * [new tag] viable/strict/1760671704 -> viable/strict/1760671704 2025-12-04T09:17:19.1942640Z * [new tag] viable/strict/1760673121 -> viable/strict/1760673121 2025-12-04T09:17:19.1944104Z * [new tag] viable/strict/1760675352 -> viable/strict/1760675352 2025-12-04T09:17:19.1945594Z * [new tag] viable/strict/1760696731 -> viable/strict/1760696731 2025-12-04T09:17:19.1948511Z * [new tag] viable/strict/1760723515 -> viable/strict/1760723515 2025-12-04T09:17:19.1949993Z * [new tag] viable/strict/1760727234 -> viable/strict/1760727234 2025-12-04T09:17:19.1951513Z * [new tag] viable/strict/1760730578 -> viable/strict/1760730578 2025-12-04T09:17:19.1953023Z * [new tag] viable/strict/1760732726 -> viable/strict/1760732726 2025-12-04T09:17:19.1954700Z * [new tag] viable/strict/1760734180 -> viable/strict/1760734180 2025-12-04T09:17:19.1956104Z * [new tag] viable/strict/1760736251 -> viable/strict/1760736251 2025-12-04T09:17:19.1957572Z * [new tag] viable/strict/1760737772 -> viable/strict/1760737772 2025-12-04T09:17:19.1959136Z * [new tag] viable/strict/1760758005 -> viable/strict/1760758005 2025-12-04T09:17:19.1960586Z * [new tag] viable/strict/1760761532 -> viable/strict/1760761532 2025-12-04T09:17:19.1962136Z * [new tag] viable/strict/1760802581 -> viable/strict/1760802581 2025-12-04T09:17:19.1963588Z * [new tag] viable/strict/1760827772 -> viable/strict/1760827772 2025-12-04T09:17:19.1965078Z * [new tag] viable/strict/1760834524 -> viable/strict/1760834524 2025-12-04T09:17:19.1966667Z * [new tag] viable/strict/1760845009 -> viable/strict/1760845009 2025-12-04T09:17:19.1968229Z * [new tag] viable/strict/1760876836 -> viable/strict/1760876836 2025-12-04T09:17:19.1969730Z * [new tag] viable/strict/1760880329 -> viable/strict/1760880329 2025-12-04T09:17:19.1971192Z * [new tag] viable/strict/1760888987 -> viable/strict/1760888987 2025-12-04T09:17:19.1972651Z * [new tag] viable/strict/1760912664 -> viable/strict/1760912664 2025-12-04T09:17:19.1974241Z * [new tag] viable/strict/1760925321 -> viable/strict/1760925321 2025-12-04T09:17:19.1975673Z * [new tag] viable/strict/1760931488 -> viable/strict/1760931488 2025-12-04T09:17:19.1977177Z * [new tag] viable/strict/1760932693 -> viable/strict/1760932693 2025-12-04T09:17:19.1978687Z * [new tag] viable/strict/1761004184 -> viable/strict/1761004184 2025-12-04T09:17:19.1980369Z * [new tag] viable/strict/1761014748 -> viable/strict/1761014748 2025-12-04T09:17:19.1981872Z * [new tag] viable/strict/1761017491 -> viable/strict/1761017491 2025-12-04T09:17:19.1983398Z * [new tag] viable/strict/1761018806 -> viable/strict/1761018806 2025-12-04T09:17:19.1984981Z * [new tag] viable/strict/1761020754 -> viable/strict/1761020754 2025-12-04T09:17:19.1986534Z * [new tag] viable/strict/1761024303 -> viable/strict/1761024303 2025-12-04T09:17:19.1988451Z * [new tag] viable/strict/1761029582 -> viable/strict/1761029582 2025-12-04T09:17:19.1989965Z * [new tag] viable/strict/1761031535 -> viable/strict/1761031535 2025-12-04T09:17:19.1991448Z * [new tag] viable/strict/1761035196 -> viable/strict/1761035196 2025-12-04T09:17:19.1992992Z * [new tag] viable/strict/1761045825 -> viable/strict/1761045825 2025-12-04T09:17:19.1994508Z * [new tag] viable/strict/1761054796 -> viable/strict/1761054796 2025-12-04T09:17:19.1996110Z * [new tag] viable/strict/1761060314 -> viable/strict/1761060314 2025-12-04T09:17:19.1997650Z * [new tag] viable/strict/1761071198 -> viable/strict/1761071198 2025-12-04T09:17:19.1999182Z * [new tag] viable/strict/1761074628 -> viable/strict/1761074628 2025-12-04T09:17:19.2000688Z * [new tag] viable/strict/1761078351 -> viable/strict/1761078351 2025-12-04T09:17:19.2002220Z * [new tag] viable/strict/1761079822 -> viable/strict/1761079822 2025-12-04T09:17:19.2003716Z * [new tag] viable/strict/1761081873 -> viable/strict/1761081873 2025-12-04T09:17:19.2005172Z * [new tag] viable/strict/1761083392 -> viable/strict/1761083392 2025-12-04T09:17:19.2006684Z * [new tag] viable/strict/1761085465 -> viable/strict/1761085465 2025-12-04T09:17:19.2008423Z * [new tag] viable/strict/1761089099 -> viable/strict/1761089099 2025-12-04T09:17:19.2010168Z * [new tag] viable/strict/1761095535 -> viable/strict/1761095535 2025-12-04T09:17:19.2011306Z * [new tag] viable/strict/1761098119 -> viable/strict/1761098119 2025-12-04T09:17:19.2013346Z * [new tag] viable/strict/1761101330 -> viable/strict/1761101330 2025-12-04T09:17:19.2014857Z * [new tag] viable/strict/1761114425 -> viable/strict/1761114425 2025-12-04T09:17:19.2016333Z * [new tag] viable/strict/1761116036 -> viable/strict/1761116036 2025-12-04T09:17:19.2017835Z * [new tag] viable/strict/1761119379 -> viable/strict/1761119379 2025-12-04T09:17:19.2019359Z * [new tag] viable/strict/1761121601 -> viable/strict/1761121601 2025-12-04T09:17:19.2020969Z * [new tag] viable/strict/1761123234 -> viable/strict/1761123234 2025-12-04T09:17:19.2022558Z * [new tag] viable/strict/1761126621 -> viable/strict/1761126621 2025-12-04T09:17:19.2023969Z * [new tag] viable/strict/1761132259 -> viable/strict/1761132259 2025-12-04T09:17:19.2025537Z * [new tag] viable/strict/1761146746 -> viable/strict/1761146746 2025-12-04T09:17:19.2027033Z * [new tag] viable/strict/1761164752 -> viable/strict/1761164752 2025-12-04T09:17:19.2028523Z * [new tag] viable/strict/1761166198 -> viable/strict/1761166198 2025-12-04T09:17:19.2030046Z * [new tag] viable/strict/1761175424 -> viable/strict/1761175424 2025-12-04T09:17:19.2031562Z * [new tag] viable/strict/1761176983 -> viable/strict/1761176983 2025-12-04T09:17:19.2033186Z * [new tag] viable/strict/1761179891 -> viable/strict/1761179891 2025-12-04T09:17:19.2034778Z * [new tag] viable/strict/1761181930 -> viable/strict/1761181930 2025-12-04T09:17:19.2036419Z * [new tag] viable/strict/1761184516 -> viable/strict/1761184516 2025-12-04T09:17:19.2038018Z * [new tag] viable/strict/1761190179 -> viable/strict/1761190179 2025-12-04T09:17:19.2039528Z * [new tag] viable/strict/1761193558 -> viable/strict/1761193558 2025-12-04T09:17:19.2041049Z * [new tag] viable/strict/1761207990 -> viable/strict/1761207990 2025-12-04T09:17:19.2042496Z * [new tag] viable/strict/1761229539 -> viable/strict/1761229539 2025-12-04T09:17:19.2044197Z * [new tag] viable/strict/1761244031 -> viable/strict/1761244031 2025-12-04T09:17:19.2045703Z * [new tag] viable/strict/1761248986 -> viable/strict/1761248986 2025-12-04T09:17:19.2047270Z * [new tag] viable/strict/1761259791 -> viable/strict/1761259791 2025-12-04T09:17:19.2048734Z * [new tag] viable/strict/1761266139 -> viable/strict/1761266139 2025-12-04T09:17:19.2050262Z * [new tag] viable/strict/1761268316 -> viable/strict/1761268316 2025-12-04T09:17:19.2051746Z * [new tag] viable/strict/1761273805 -> viable/strict/1761273805 2025-12-04T09:17:19.2053260Z * [new tag] viable/strict/1761275261 -> viable/strict/1761275261 2025-12-04T09:17:19.2054785Z * [new tag] viable/strict/1761277913 -> viable/strict/1761277913 2025-12-04T09:17:19.2056377Z * [new tag] viable/strict/1761290701 -> viable/strict/1761290701 2025-12-04T09:17:19.2057891Z * [new tag] viable/strict/1761294396 -> viable/strict/1761294396 2025-12-04T09:17:19.2059510Z * [new tag] viable/strict/1761303047 -> viable/strict/1761303047 2025-12-04T09:17:19.2060989Z * [new tag] viable/strict/1761335388 -> viable/strict/1761335388 2025-12-04T09:17:19.2062543Z * [new tag] viable/strict/1761337551 -> viable/strict/1761337551 2025-12-04T09:17:19.2064135Z * [new tag] viable/strict/1761339007 -> viable/strict/1761339007 2025-12-04T09:17:19.2065607Z * [new tag] viable/strict/1761341050 -> viable/strict/1761341050 2025-12-04T09:17:19.2067048Z * [new tag] viable/strict/1761346188 -> viable/strict/1761346188 2025-12-04T09:17:19.2068609Z * [new tag] viable/strict/1761349792 -> viable/strict/1761349792 2025-12-04T09:17:19.2070216Z * [new tag] viable/strict/1761352620 -> viable/strict/1761352620 2025-12-04T09:17:19.2071722Z * [new tag] viable/strict/1761354730 -> viable/strict/1761354730 2025-12-04T09:17:19.2073253Z * [new tag] viable/strict/1761357298 -> viable/strict/1761357298 2025-12-04T09:17:19.2074758Z * [new tag] viable/strict/1761360201 -> viable/strict/1761360201 2025-12-04T09:17:19.2076279Z * [new tag] viable/strict/1761361753 -> viable/strict/1761361753 2025-12-04T09:17:19.2077795Z * [new tag] viable/strict/1761364351 -> viable/strict/1761364351 2025-12-04T09:17:19.2079351Z * [new tag] viable/strict/1761366338 -> viable/strict/1761366338 2025-12-04T09:17:19.2080957Z * [new tag] viable/strict/1761367802 -> viable/strict/1761367802 2025-12-04T09:17:19.2082438Z * [new tag] viable/strict/1761369889 -> viable/strict/1761369889 2025-12-04T09:17:19.2084009Z * [new tag] viable/strict/1761371385 -> viable/strict/1761371385 2025-12-04T09:17:19.2085888Z * [new tag] viable/strict/1761373581 -> viable/strict/1761373581 2025-12-04T09:17:19.2088036Z * [new tag] viable/strict/1761375054 -> viable/strict/1761375054 2025-12-04T09:17:19.2089584Z * [new tag] viable/strict/1761421785 -> viable/strict/1761421785 2025-12-04T09:17:19.2091167Z * [new tag] viable/strict/1761434614 -> viable/strict/1761434614 2025-12-04T09:17:19.2093034Z * [new tag] viable/strict/1761439254 -> viable/strict/1761439254 2025-12-04T09:17:19.2094587Z * [new tag] viable/strict/1761454187 -> viable/strict/1761454187 2025-12-04T09:17:19.2096278Z * [new tag] viable/strict/1761459991 -> viable/strict/1761459991 2025-12-04T09:17:19.2098053Z * [new tag] viable/strict/1761470668 -> viable/strict/1761470668 2025-12-04T09:17:19.2100115Z * [new tag] viable/strict/1761472188 -> viable/strict/1761472188 2025-12-04T09:17:19.2101608Z * [new tag] viable/strict/1761503178 -> viable/strict/1761503178 2025-12-04T09:17:19.2103116Z * [new tag] viable/strict/1761517492 -> viable/strict/1761517492 2025-12-04T09:17:19.2104636Z * [new tag] viable/strict/1761518981 -> viable/strict/1761518981 2025-12-04T09:17:19.2106257Z * [new tag] viable/strict/1761533609 -> viable/strict/1761533609 2025-12-04T09:17:19.2107616Z * [new tag] viable/strict/1761546438 -> viable/strict/1761546438 2025-12-04T09:17:19.2109456Z * [new tag] viable/strict/1761548133 -> viable/strict/1761548133 2025-12-04T09:17:19.2111245Z * [new tag] viable/strict/1761555186 -> viable/strict/1761555186 2025-12-04T09:17:19.2112910Z * [new tag] viable/strict/1761557178 -> viable/strict/1761557178 2025-12-04T09:17:19.2114425Z * [new tag] viable/strict/1761560772 -> viable/strict/1761560772 2025-12-04T09:17:19.2115994Z * [new tag] viable/strict/1761562266 -> viable/strict/1761562266 2025-12-04T09:17:19.2117559Z * [new tag] viable/strict/1761564260 -> viable/strict/1761564260 2025-12-04T09:17:19.2119051Z * [new tag] viable/strict/1761568072 -> viable/strict/1761568072 2025-12-04T09:17:19.2120585Z * [new tag] viable/strict/1761571683 -> viable/strict/1761571683 2025-12-04T09:17:19.2122153Z * [new tag] viable/strict/1761580199 -> viable/strict/1761580199 2025-12-04T09:17:19.2123536Z * [new tag] viable/strict/1761587383 -> viable/strict/1761587383 2025-12-04T09:17:19.2125119Z * [new tag] viable/strict/1761591165 -> viable/strict/1761591165 2025-12-04T09:17:19.2126627Z * [new tag] viable/strict/1761594575 -> viable/strict/1761594575 2025-12-04T09:17:19.2128205Z * [new tag] viable/strict/1761596710 -> viable/strict/1761596710 2025-12-04T09:17:19.2129714Z * [new tag] viable/strict/1761598189 -> viable/strict/1761598189 2025-12-04T09:17:19.2131225Z * [new tag] viable/strict/1761600254 -> viable/strict/1761600254 2025-12-04T09:17:19.2132756Z * [new tag] viable/strict/1761603879 -> viable/strict/1761603879 2025-12-04T09:17:19.2134326Z * [new tag] viable/strict/1761605429 -> viable/strict/1761605429 2025-12-04T09:17:19.2135944Z * [new tag] viable/strict/1761607468 -> viable/strict/1761607468 2025-12-04T09:17:19.2137554Z * [new tag] viable/strict/1761608983 -> viable/strict/1761608983 2025-12-04T09:17:19.2139138Z * [new tag] viable/strict/1761611846 -> viable/strict/1761611846 2025-12-04T09:17:19.2140815Z * [new tag] viable/strict/1761613922 -> viable/strict/1761613922 2025-12-04T09:17:19.2142175Z * [new tag] viable/strict/1761616504 -> viable/strict/1761616504 2025-12-04T09:17:19.2143505Z * [new tag] viable/strict/1761619599 -> viable/strict/1761619599 2025-12-04T09:17:19.2145113Z * [new tag] viable/strict/1761686693 -> viable/strict/1761686693 2025-12-04T09:17:19.2146686Z * [new tag] viable/strict/1761688179 -> viable/strict/1761688179 2025-12-04T09:17:19.2148176Z * [new tag] viable/strict/1761691973 -> viable/strict/1761691973 2025-12-04T09:17:19.2149798Z * [new tag] viable/strict/1761693884 -> viable/strict/1761693884 2025-12-04T09:17:19.2151402Z * [new tag] viable/strict/1761695389 -> viable/strict/1761695389 2025-12-04T09:17:19.2152984Z * [new tag] viable/strict/1761698408 -> viable/strict/1761698408 2025-12-04T09:17:19.2154445Z * [new tag] viable/strict/1761702931 -> viable/strict/1761702931 2025-12-04T09:17:19.2156028Z * [new tag] viable/strict/1761706307 -> viable/strict/1761706307 2025-12-04T09:17:19.2157605Z * [new tag] viable/strict/1761709065 -> viable/strict/1761709065 2025-12-04T09:17:19.2159297Z * [new tag] viable/strict/1761710285 -> viable/strict/1761710285 2025-12-04T09:17:19.2160866Z * [new tag] viable/strict/1761711983 -> viable/strict/1761711983 2025-12-04T09:17:19.2162472Z * [new tag] viable/strict/1761713514 -> viable/strict/1761713514 2025-12-04T09:17:19.2164180Z * [new tag] viable/strict/1761715523 -> viable/strict/1761715523 2025-12-04T09:17:19.2165863Z * [new tag] viable/strict/1761727973 -> viable/strict/1761727973 2025-12-04T09:17:19.2167485Z * [new tag] viable/strict/1761751558 -> viable/strict/1761751558 2025-12-04T09:17:19.2169149Z * [new tag] viable/strict/1761755187 -> viable/strict/1761755187 2025-12-04T09:17:19.2170727Z * [new tag] viable/strict/1761756826 -> viable/strict/1761756826 2025-12-04T09:17:19.2172343Z * [new tag] viable/strict/1761769551 -> viable/strict/1761769551 2025-12-04T09:17:19.2173960Z * [new tag] viable/strict/1761771032 -> viable/strict/1761771032 2025-12-04T09:17:19.2175489Z * [new tag] viable/strict/1761773101 -> viable/strict/1761773101 2025-12-04T09:17:19.2177122Z * [new tag] viable/strict/1761781792 -> viable/strict/1761781792 2025-12-04T09:17:19.2178889Z * [new tag] viable/strict/1761784788 -> viable/strict/1761784788 2025-12-04T09:17:19.2180510Z * [new tag] viable/strict/1761786740 -> viable/strict/1761786740 2025-12-04T09:17:19.2182116Z * [new tag] viable/strict/1761789332 -> viable/strict/1761789332 2025-12-04T09:17:19.2184172Z * [new tag] viable/strict/1761792569 -> viable/strict/1761792569 2025-12-04T09:17:19.2185754Z * [new tag] viable/strict/1761795289 -> viable/strict/1761795289 2025-12-04T09:17:19.2187317Z * [new tag] viable/strict/1761798345 -> viable/strict/1761798345 2025-12-04T09:17:19.2189006Z * [new tag] viable/strict/1761799827 -> viable/strict/1761799827 2025-12-04T09:17:19.2191167Z * [new tag] viable/strict/1761805604 -> viable/strict/1761805604 2025-12-04T09:17:19.2192726Z * [new tag] viable/strict/1761807202 -> viable/strict/1761807202 2025-12-04T09:17:19.2194307Z * [new tag] viable/strict/1761809094 -> viable/strict/1761809094 2025-12-04T09:17:19.2195883Z * [new tag] viable/strict/1761810576 -> viable/strict/1761810576 2025-12-04T09:17:19.2197504Z * [new tag] viable/strict/1761812771 -> viable/strict/1761812771 2025-12-04T09:17:19.2199090Z * [new tag] viable/strict/1761814363 -> viable/strict/1761814363 2025-12-04T09:17:19.2200700Z * [new tag] viable/strict/1761857410 -> viable/strict/1761857410 2025-12-04T09:17:19.2202323Z * [new tag] viable/strict/1761860985 -> viable/strict/1761860985 2025-12-04T09:17:19.2203947Z * [new tag] viable/strict/1761863094 -> viable/strict/1761863094 2025-12-04T09:17:19.2205499Z * [new tag] viable/strict/1761864590 -> viable/strict/1761864590 2025-12-04T09:17:19.2207084Z * [new tag] viable/strict/1761866675 -> viable/strict/1761866675 2025-12-04T09:17:19.2208892Z * [new tag] viable/strict/1761868178 -> viable/strict/1761868178 2025-12-04T09:17:19.2213332Z * [new tag] viable/strict/1761871111 -> viable/strict/1761871111 2025-12-04T09:17:19.2214931Z * [new tag] viable/strict/1761873126 -> viable/strict/1761873126 2025-12-04T09:17:19.2216541Z * [new tag] viable/strict/1761875714 -> viable/strict/1761875714 2025-12-04T09:17:19.2218147Z * [new tag] viable/strict/1761878924 -> viable/strict/1761878924 2025-12-04T09:17:19.2219910Z * [new tag] viable/strict/1761881727 -> viable/strict/1761881727 2025-12-04T09:17:19.2221502Z * [new tag] viable/strict/1761882959 -> viable/strict/1761882959 2025-12-04T09:17:19.2223103Z * [new tag] viable/strict/1761886268 -> viable/strict/1761886268 2025-12-04T09:17:19.2224676Z * [new tag] viable/strict/1761893641 -> viable/strict/1761893641 2025-12-04T09:17:19.2226320Z * [new tag] viable/strict/1761931517 -> viable/strict/1761931517 2025-12-04T09:17:19.2227883Z * [new tag] viable/strict/1761933080 -> viable/strict/1761933080 2025-12-04T09:17:19.2229434Z * [new tag] viable/strict/1761935217 -> viable/strict/1761935217 2025-12-04T09:17:19.2231141Z * [new tag] viable/strict/1761938533 -> viable/strict/1761938533 2025-12-04T09:17:19.2232808Z * [new tag] viable/strict/1761940184 -> viable/strict/1761940184 2025-12-04T09:17:19.2234398Z * [new tag] viable/strict/1761942338 -> viable/strict/1761942338 2025-12-04T09:17:19.2236005Z * [new tag] viable/strict/1761946100 -> viable/strict/1761946100 2025-12-04T09:17:19.2237557Z * [new tag] viable/strict/1761947374 -> viable/strict/1761947374 2025-12-04T09:17:19.2239130Z * [new tag] viable/strict/1761950978 -> viable/strict/1761950978 2025-12-04T09:17:19.2240850Z * [new tag] viable/strict/1761957727 -> viable/strict/1761957727 2025-12-04T09:17:19.2242344Z * [new tag] viable/strict/1761959532 -> viable/strict/1761959532 2025-12-04T09:17:19.2244030Z * [new tag] viable/strict/1761965366 -> viable/strict/1761965366 2025-12-04T09:17:19.2245911Z * [new tag] viable/strict/1761968066 -> viable/strict/1761968066 2025-12-04T09:17:19.2247471Z * [new tag] viable/strict/1761969322 -> viable/strict/1761969322 2025-12-04T09:17:19.2249048Z * [new tag] viable/strict/1761974723 -> viable/strict/1761974723 2025-12-04T09:17:19.2250735Z * [new tag] viable/strict/1761981837 -> viable/strict/1761981837 2025-12-04T09:17:19.2252390Z * [new tag] viable/strict/1761985546 -> viable/strict/1761985546 2025-12-04T09:17:19.2254003Z * [new tag] viable/strict/1761987030 -> viable/strict/1761987030 2025-12-04T09:17:19.2255637Z * [new tag] viable/strict/1762003554 -> viable/strict/1762003554 2025-12-04T09:17:19.2257218Z * [new tag] viable/strict/1762021560 -> viable/strict/1762021560 2025-12-04T09:17:19.2258850Z * [new tag] viable/strict/1762032190 -> viable/strict/1762032190 2025-12-04T09:17:19.2260635Z * [new tag] viable/strict/1762040981 -> viable/strict/1762040981 2025-12-04T09:17:19.2262288Z * [new tag] viable/strict/1762048525 -> viable/strict/1762048525 2025-12-04T09:17:19.2263905Z * [new tag] viable/strict/1762104223 -> viable/strict/1762104223 2025-12-04T09:17:19.2265562Z * [new tag] viable/strict/1762105778 -> viable/strict/1762105778 2025-12-04T09:17:19.2267076Z * [new tag] viable/strict/1762115109 -> viable/strict/1762115109 2025-12-04T09:17:19.2268626Z * [new tag] viable/strict/1762125840 -> viable/strict/1762125840 2025-12-04T09:17:19.2270124Z * [new tag] viable/strict/1762127377 -> viable/strict/1762127377 2025-12-04T09:17:19.2272017Z * [new tag] viable/strict/1762134925 -> viable/strict/1762134925 2025-12-04T09:17:19.2273516Z * [new tag] viable/strict/1762138338 -> viable/strict/1762138338 2025-12-04T09:17:19.2275137Z * [new tag] viable/strict/1762148993 -> viable/strict/1762148993 2025-12-04T09:17:19.2276787Z * [new tag] viable/strict/1762152871 -> viable/strict/1762152871 2025-12-04T09:17:19.2278357Z * [new tag] viable/strict/1762156183 -> viable/strict/1762156183 2025-12-04T09:17:19.2279914Z * [new tag] viable/strict/1762163457 -> viable/strict/1762163457 2025-12-04T09:17:19.2281494Z * [new tag] viable/strict/1762165569 -> viable/strict/1762165569 2025-12-04T09:17:19.2283061Z * [new tag] viable/strict/1762169035 -> viable/strict/1762169035 2025-12-04T09:17:19.2284779Z * [new tag] viable/strict/1762174936 -> viable/strict/1762174936 2025-12-04T09:17:19.2286332Z * [new tag] viable/strict/1762194412 -> viable/strict/1762194412 2025-12-04T09:17:19.2287916Z * [new tag] viable/strict/1762195876 -> viable/strict/1762195876 2025-12-04T09:17:19.2289527Z * [new tag] viable/strict/1762197788 -> viable/strict/1762197788 2025-12-04T09:17:19.2291130Z * [new tag] viable/strict/1762199389 -> viable/strict/1762199389 2025-12-04T09:17:19.2292943Z * [new tag] viable/strict/1762206585 -> viable/strict/1762206585 2025-12-04T09:17:19.2294605Z * [new tag] viable/strict/1762210184 -> viable/strict/1762210184 2025-12-04T09:17:19.2296193Z * [new tag] viable/strict/1762218736 -> viable/strict/1762218736 2025-12-04T09:17:19.2298301Z * [new tag] viable/strict/1762224529 -> viable/strict/1762224529 2025-12-04T09:17:19.2300177Z * [new tag] viable/strict/1762227253 -> viable/strict/1762227253 2025-12-04T09:17:19.2301525Z * [new tag] viable/strict/1762228515 -> viable/strict/1762228515 2025-12-04T09:17:19.2303217Z * [new tag] viable/strict/1762230349 -> viable/strict/1762230349 2025-12-04T09:17:19.2304880Z * [new tag] viable/strict/1762231859 -> viable/strict/1762231859 2025-12-04T09:17:19.2306466Z * [new tag] viable/strict/1762233925 -> viable/strict/1762233925 2025-12-04T09:17:19.2308425Z * [new tag] viable/strict/1762237630 -> viable/strict/1762237630 2025-12-04T09:17:19.2309832Z * [new tag] viable/strict/1762253522 -> viable/strict/1762253522 2025-12-04T09:17:19.2311645Z * [new tag] viable/strict/1762278588 -> viable/strict/1762278588 2025-12-04T09:17:19.2313263Z * [new tag] viable/strict/1762284203 -> viable/strict/1762284203 2025-12-04T09:17:19.2314881Z * [new tag] viable/strict/1762289446 -> viable/strict/1762289446 2025-12-04T09:17:19.2316520Z * [new tag] viable/strict/1762291515 -> viable/strict/1762291515 2025-12-04T09:17:19.2318085Z * [new tag] viable/strict/1762295100 -> viable/strict/1762295100 2025-12-04T09:17:19.2319663Z * [new tag] viable/strict/1762296590 -> viable/strict/1762296590 2025-12-04T09:17:19.2321265Z * [new tag] viable/strict/1762300179 -> viable/strict/1762300179 2025-12-04T09:17:19.2322747Z * [new tag] viable/strict/1762303207 -> viable/strict/1762303207 2025-12-04T09:17:19.2324355Z * [new tag] viable/strict/1762386584 -> viable/strict/1762386584 2025-12-04T09:17:19.2325968Z * [new tag] viable/strict/1762391537 -> viable/strict/1762391537 2025-12-04T09:17:19.2327428Z * [new tag] viable/strict/1762394119 -> viable/strict/1762394119 2025-12-04T09:17:19.2329342Z * [new tag] viable/strict/1762397437 -> viable/strict/1762397437 2025-12-04T09:17:19.2330918Z * [new tag] viable/strict/1762400256 -> viable/strict/1762400256 2025-12-04T09:17:19.2332492Z * [new tag] viable/strict/1762401469 -> viable/strict/1762401469 2025-12-04T09:17:19.2334183Z * [new tag] viable/strict/1762408195 -> viable/strict/1762408195 2025-12-04T09:17:19.2336033Z * [new tag] viable/strict/1762410411 -> viable/strict/1762410411 2025-12-04T09:17:19.2337693Z * [new tag] viable/strict/1762417613 -> viable/strict/1762417613 2025-12-04T09:17:19.2339416Z * [new tag] viable/strict/1762419198 -> viable/strict/1762419198 2025-12-04T09:17:19.2341144Z * [new tag] viable/strict/1762422656 -> viable/strict/1762422656 2025-12-04T09:17:19.2343145Z * [new tag] viable/strict/1762424746 -> viable/strict/1762424746 2025-12-04T09:17:19.2344765Z * [new tag] viable/strict/1762446386 -> viable/strict/1762446386 2025-12-04T09:17:19.2346397Z * [new tag] viable/strict/1762449912 -> viable/strict/1762449912 2025-12-04T09:17:19.2347995Z * [new tag] viable/strict/1762457031 -> viable/strict/1762457031 2025-12-04T09:17:19.2349748Z * [new tag] viable/strict/1762462441 -> viable/strict/1762462441 2025-12-04T09:17:19.2351293Z * [new tag] viable/strict/1762467909 -> viable/strict/1762467909 2025-12-04T09:17:19.2352932Z * [new tag] viable/strict/1762471493 -> viable/strict/1762471493 2025-12-04T09:17:19.2354580Z * [new tag] viable/strict/1762475990 -> viable/strict/1762475990 2025-12-04T09:17:19.2356334Z * [new tag] viable/strict/1762477933 -> viable/strict/1762477933 2025-12-04T09:17:19.2357916Z * [new tag] viable/strict/1762491053 -> viable/strict/1762491053 2025-12-04T09:17:19.2359662Z * [new tag] viable/strict/1762493118 -> viable/strict/1762493118 2025-12-04T09:17:19.2361122Z * [new tag] viable/strict/1762498442 -> viable/strict/1762498442 2025-12-04T09:17:19.2362761Z * [new tag] viable/strict/1762501778 -> viable/strict/1762501778 2025-12-04T09:17:19.2364415Z * [new tag] viable/strict/1762504001 -> viable/strict/1762504001 2025-12-04T09:17:19.2366051Z * [new tag] viable/strict/1762505583 -> viable/strict/1762505583 2025-12-04T09:17:19.2367789Z * [new tag] viable/strict/1762507523 -> viable/strict/1762507523 2025-12-04T09:17:19.2369427Z * [new tag] viable/strict/1762511140 -> viable/strict/1762511140 2025-12-04T09:17:19.2372501Z * [new tag] viable/strict/1762512632 -> viable/strict/1762512632 2025-12-04T09:17:19.2372843Z * [new tag] viable/strict/1762520467 -> viable/strict/1762520467 2025-12-04T09:17:19.2374458Z * [new tag] viable/strict/1762522016 -> viable/strict/1762522016 2025-12-04T09:17:19.2376021Z * [new tag] viable/strict/1762530591 -> viable/strict/1762530591 2025-12-04T09:17:19.2377580Z * [new tag] viable/strict/1762543405 -> viable/strict/1762543405 2025-12-04T09:17:19.2379070Z * [new tag] viable/strict/1762544998 -> viable/strict/1762544998 2025-12-04T09:17:19.2380723Z * [new tag] viable/strict/1762552182 -> viable/strict/1762552182 2025-12-04T09:17:19.2382353Z * [new tag] viable/strict/1762554297 -> viable/strict/1762554297 2025-12-04T09:17:19.2383776Z * [new tag] viable/strict/1762559381 -> viable/strict/1762559381 2025-12-04T09:17:19.2385380Z * [new tag] viable/strict/1762562222 -> viable/strict/1762562222 2025-12-04T09:17:19.2386961Z * [new tag] viable/strict/1762564319 -> viable/strict/1762564319 2025-12-04T09:17:19.2388480Z * [new tag] viable/strict/1762566904 -> viable/strict/1762566904 2025-12-04T09:17:19.2390092Z * [new tag] viable/strict/1762569781 -> viable/strict/1762569781 2025-12-04T09:17:19.2391666Z * [new tag] viable/strict/1762575940 -> viable/strict/1762575940 2025-12-04T09:17:19.2393273Z * [new tag] viable/strict/1762580974 -> viable/strict/1762580974 2025-12-04T09:17:19.2394954Z * [new tag] viable/strict/1762583185 -> viable/strict/1762583185 2025-12-04T09:17:19.2396493Z * [new tag] viable/strict/1762586647 -> viable/strict/1762586647 2025-12-04T09:17:19.2398072Z * [new tag] viable/strict/1762588183 -> viable/strict/1762588183 2025-12-04T09:17:19.2399673Z * [new tag] viable/strict/1762593886 -> viable/strict/1762593886 2025-12-04T09:17:19.2401466Z * [new tag] viable/strict/1762650743 -> viable/strict/1762650743 2025-12-04T09:17:19.2403601Z * [new tag] viable/strict/1762653328 -> viable/strict/1762653328 2025-12-04T09:17:19.2405234Z * [new tag] viable/strict/1762659342 -> viable/strict/1762659342 2025-12-04T09:17:19.2406867Z * [new tag] viable/strict/1762662360 -> viable/strict/1762662360 2025-12-04T09:17:19.2408766Z * [new tag] viable/strict/1762667377 -> viable/strict/1762667377 2025-12-04T09:17:19.2410266Z * [new tag] viable/strict/1762671090 -> viable/strict/1762671090 2025-12-04T09:17:19.2411870Z * [new tag] viable/strict/1762680284 -> viable/strict/1762680284 2025-12-04T09:17:19.2413526Z * [new tag] viable/strict/1762683900 -> viable/strict/1762683900 2025-12-04T09:17:19.2415132Z * [new tag] viable/strict/1762705541 -> viable/strict/1762705541 2025-12-04T09:17:19.2416766Z * [new tag] viable/strict/1762709004 -> viable/strict/1762709004 2025-12-04T09:17:19.2418558Z * [new tag] viable/strict/1762746004 -> viable/strict/1762746004 2025-12-04T09:17:19.2420365Z * [new tag] viable/strict/1762748799 -> viable/strict/1762748799 2025-12-04T09:17:19.2421960Z * [new tag] viable/strict/1762759504 -> viable/strict/1762759504 2025-12-04T09:17:19.2423706Z * [new tag] viable/strict/1762760973 -> viable/strict/1762760973 2025-12-04T09:17:19.2425290Z * [new tag] viable/strict/1762775374 -> viable/strict/1762775374 2025-12-04T09:17:19.2426960Z * [new tag] viable/strict/1762777661 -> viable/strict/1762777661 2025-12-04T09:17:19.2428542Z * [new tag] viable/strict/1762779774 -> viable/strict/1762779774 2025-12-04T09:17:19.2430279Z * [new tag] viable/strict/1762781259 -> viable/strict/1762781259 2025-12-04T09:17:19.2431910Z * [new tag] viable/strict/1762793628 -> viable/strict/1762793628 2025-12-04T09:17:19.2433609Z * [new tag] viable/strict/1762800711 -> viable/strict/1762800711 2025-12-04T09:17:19.2435217Z * [new tag] viable/strict/1762809894 -> viable/strict/1762809894 2025-12-04T09:17:19.2436807Z * [new tag] viable/strict/1762811384 -> viable/strict/1762811384 2025-12-04T09:17:19.2438480Z * [new tag] viable/strict/1762813841 -> viable/strict/1762813841 2025-12-04T09:17:19.2440176Z * [new tag] viable/strict/1762815047 -> viable/strict/1762815047 2025-12-04T09:17:19.2441908Z * [new tag] viable/strict/1762817094 -> viable/strict/1762817094 2025-12-04T09:17:19.2443534Z * [new tag] viable/strict/1762818582 -> viable/strict/1762818582 2025-12-04T09:17:19.2445130Z * [new tag] viable/strict/1762821623 -> viable/strict/1762821623 2025-12-04T09:17:19.2446550Z * [new tag] viable/strict/1762823531 -> viable/strict/1762823531 2025-12-04T09:17:19.2448269Z * [new tag] viable/strict/1762849583 -> viable/strict/1762849583 2025-12-04T09:17:19.2449874Z * [new tag] viable/strict/1762851200 -> viable/strict/1762851200 2025-12-04T09:17:19.2451471Z * [new tag] viable/strict/1762854603 -> viable/strict/1762854603 2025-12-04T09:17:19.2453097Z * [new tag] viable/strict/1762858276 -> viable/strict/1762858276 2025-12-04T09:17:19.2454852Z * [new tag] viable/strict/1762860891 -> viable/strict/1762860891 2025-12-04T09:17:19.2457079Z * [new tag] viable/strict/1762866174 -> viable/strict/1762866174 2025-12-04T09:17:19.2458660Z * [new tag] viable/strict/1762867653 -> viable/strict/1762867653 2025-12-04T09:17:19.2460409Z * [new tag] viable/strict/1762872669 -> viable/strict/1762872669 2025-12-04T09:17:19.2461872Z * [new tag] viable/strict/1762878380 -> viable/strict/1762878380 2025-12-04T09:17:19.2463501Z * [new tag] viable/strict/1762889003 -> viable/strict/1762889003 2025-12-04T09:17:19.2465101Z * [new tag] viable/strict/1762890589 -> viable/strict/1762890589 2025-12-04T09:17:19.2466707Z * [new tag] viable/strict/1762892743 -> viable/strict/1762892743 2025-12-04T09:17:19.2468345Z * [new tag] viable/strict/1762894271 -> viable/strict/1762894271 2025-12-04T09:17:19.2469766Z * [new tag] viable/strict/1762896287 -> viable/strict/1762896287 2025-12-04T09:17:19.2471406Z * [new tag] viable/strict/1762915871 -> viable/strict/1762915871 2025-12-04T09:17:19.2473051Z * [new tag] viable/strict/1762918569 -> viable/strict/1762918569 2025-12-04T09:17:19.2474502Z * [new tag] viable/strict/1762919776 -> viable/strict/1762919776 2025-12-04T09:17:19.2476073Z * [new tag] viable/strict/1762923072 -> viable/strict/1762923072 2025-12-04T09:17:19.2477890Z * [new tag] viable/strict/1762928826 -> viable/strict/1762928826 2025-12-04T09:17:19.2479516Z * [new tag] viable/strict/1762930451 -> viable/strict/1762930451 2025-12-04T09:17:19.2481070Z * [new tag] viable/strict/1762933780 -> viable/strict/1762933780 2025-12-04T09:17:19.2482717Z * [new tag] viable/strict/1762937638 -> viable/strict/1762937638 2025-12-04T09:17:19.2484446Z * [new tag] viable/strict/1762939545 -> viable/strict/1762939545 2025-12-04T09:17:19.2486087Z * [new tag] viable/strict/1762962692 -> viable/strict/1762962692 2025-12-04T09:17:19.2487644Z * [new tag] viable/strict/1762979143 -> viable/strict/1762979143 2025-12-04T09:17:19.2489242Z * [new tag] viable/strict/1762984188 -> viable/strict/1762984188 2025-12-04T09:17:19.2490710Z * [new tag] viable/strict/1762986306 -> viable/strict/1762986306 2025-12-04T09:17:19.2492391Z * [new tag] viable/strict/1762989903 -> viable/strict/1762989903 2025-12-04T09:17:19.2493977Z * [new tag] viable/strict/1762991377 -> viable/strict/1762991377 2025-12-04T09:17:19.2495575Z * [new tag] viable/strict/1762998921 -> viable/strict/1762998921 2025-12-04T09:17:19.2497260Z * [new tag] viable/strict/1763002287 -> viable/strict/1763002287 2025-12-04T09:17:19.2498997Z * [new tag] viable/strict/1763016840 -> viable/strict/1763016840 2025-12-04T09:17:19.2500676Z * [new tag] viable/strict/1763020180 -> viable/strict/1763020180 2025-12-04T09:17:19.2502387Z * [new tag] viable/strict/1763027421 -> viable/strict/1763027421 2025-12-04T09:17:19.2503979Z * [new tag] viable/strict/1763031120 -> viable/strict/1763031120 2025-12-04T09:17:19.2505704Z * [new tag] viable/strict/1763036861 -> viable/strict/1763036861 2025-12-04T09:17:19.2507404Z * [new tag] viable/strict/1763038993 -> viable/strict/1763038993 2025-12-04T09:17:19.2509785Z * [new tag] viable/strict/1763054703 -> viable/strict/1763054703 2025-12-04T09:17:19.2511253Z * [new tag] viable/strict/1763067061 -> viable/strict/1763067061 2025-12-04T09:17:19.2512904Z * [new tag] viable/strict/1763070847 -> viable/strict/1763070847 2025-12-04T09:17:19.2514526Z * [new tag] viable/strict/1763072706 -> viable/strict/1763072706 2025-12-04T09:17:19.2516265Z * [new tag] viable/strict/1763076302 -> viable/strict/1763076302 2025-12-04T09:17:19.2517967Z * [new tag] viable/strict/1763080816 -> viable/strict/1763080816 2025-12-04T09:17:19.2519552Z * [new tag] viable/strict/1763082732 -> viable/strict/1763082732 2025-12-04T09:17:19.2521130Z * [new tag] viable/strict/1763085329 -> viable/strict/1763085329 2025-12-04T09:17:19.2522796Z * [new tag] viable/strict/1763088623 -> viable/strict/1763088623 2025-12-04T09:17:19.2524539Z * [new tag] viable/strict/1763091402 -> viable/strict/1763091402 2025-12-04T09:17:19.2526124Z * [new tag] viable/strict/1763092602 -> viable/strict/1763092602 2025-12-04T09:17:19.2527715Z * [new tag] viable/strict/1763094355 -> viable/strict/1763094355 2025-12-04T09:17:19.2529386Z * [new tag] viable/strict/1763099390 -> viable/strict/1763099390 2025-12-04T09:17:19.2530983Z * [new tag] viable/strict/1763101608 -> viable/strict/1763101608 2025-12-04T09:17:19.2532636Z * [new tag] viable/strict/1763105102 -> viable/strict/1763105102 2025-12-04T09:17:19.2534358Z * [new tag] viable/strict/1763112347 -> viable/strict/1763112347 2025-12-04T09:17:19.2536047Z * [new tag] viable/strict/1763119471 -> viable/strict/1763119471 2025-12-04T09:17:19.2537702Z * [new tag] viable/strict/1763126835 -> viable/strict/1763126835 2025-12-04T09:17:19.2539050Z * [new tag] viable/strict/1763149779 -> viable/strict/1763149779 2025-12-04T09:17:19.2540808Z * [new tag] viable/strict/1763164178 -> viable/strict/1763164178 2025-12-04T09:17:19.2542466Z * [new tag] viable/strict/1763167104 -> viable/strict/1763167104 2025-12-04T09:17:19.2544002Z * [new tag] viable/strict/1763169132 -> viable/strict/1763169132 2025-12-04T09:17:19.2545615Z * [new tag] viable/strict/1763171708 -> viable/strict/1763171708 2025-12-04T09:17:19.2547207Z * [new tag] viable/strict/1763174759 -> viable/strict/1763174759 2025-12-04T09:17:19.2548819Z * [new tag] viable/strict/1763180744 -> viable/strict/1763180744 2025-12-04T09:17:19.2550431Z * [new tag] viable/strict/1763182227 -> viable/strict/1763182227 2025-12-04T09:17:19.2552023Z * [new tag] viable/strict/1763184309 -> viable/strict/1763184309 2025-12-04T09:17:19.2554109Z * [new tag] viable/strict/1763187991 -> viable/strict/1763187991 2025-12-04T09:17:19.2555732Z * [new tag] viable/strict/1763191445 -> viable/strict/1763191445 2025-12-04T09:17:19.2557593Z * [new tag] viable/strict/1763195152 -> viable/strict/1763195152 2025-12-04T09:17:19.2559055Z * [new tag] viable/strict/1763205769 -> viable/strict/1763205769 2025-12-04T09:17:19.2560790Z * [new tag] viable/strict/1763246990 -> viable/strict/1763246990 2025-12-04T09:17:19.2562471Z * [new tag] viable/strict/1763261578 -> viable/strict/1763261578 2025-12-04T09:17:19.2563994Z * [new tag] viable/strict/1763286573 -> viable/strict/1763286573 2025-12-04T09:17:19.2565430Z * [new tag] viable/strict/1763292167 -> viable/strict/1763292167 2025-12-04T09:17:19.2567082Z * [new tag] viable/strict/1763333386 -> viable/strict/1763333386 2025-12-04T09:17:19.2568694Z * [new tag] viable/strict/1763340082 -> viable/strict/1763340082 2025-12-04T09:17:19.2570971Z * [new tag] viable/strict/1763364324 -> viable/strict/1763364324 2025-12-04T09:17:19.2572637Z * [new tag] viable/strict/1763371569 -> viable/strict/1763371569 2025-12-04T09:17:19.2574222Z * [new tag] viable/strict/1763373067 -> viable/strict/1763373067 2025-12-04T09:17:19.2575825Z * [new tag] viable/strict/1763375157 -> viable/strict/1763375157 2025-12-04T09:17:19.2577447Z * [new tag] viable/strict/1763382462 -> viable/strict/1763382462 2025-12-04T09:17:19.2579221Z * [new tag] viable/strict/1763394661 -> viable/strict/1763394661 2025-12-04T09:17:19.2581007Z * [new tag] viable/strict/1763396797 -> viable/strict/1763396797 2025-12-04T09:17:19.2582701Z * [new tag] viable/strict/1763398542 -> viable/strict/1763398542 2025-12-04T09:17:19.2584371Z * [new tag] viable/strict/1763401807 -> viable/strict/1763401807 2025-12-04T09:17:19.2585815Z * [new tag] viable/strict/1763414698 -> viable/strict/1763414698 2025-12-04T09:17:19.2587653Z * [new tag] viable/strict/1763419807 -> viable/strict/1763419807 2025-12-04T09:17:19.2589331Z * [new tag] viable/strict/1763426369 -> viable/strict/1763426369 2025-12-04T09:17:19.2591021Z * [new tag] viable/strict/1763428331 -> viable/strict/1763428331 2025-12-04T09:17:19.2592693Z * [new tag] viable/strict/1763430922 -> viable/strict/1763430922 2025-12-04T09:17:19.2594151Z * [new tag] viable/strict/1763434184 -> viable/strict/1763434184 2025-12-04T09:17:19.2595776Z * [new tag] viable/strict/1763439973 -> viable/strict/1763439973 2025-12-04T09:17:19.2597545Z * [new tag] viable/strict/1763444995 -> viable/strict/1763444995 2025-12-04T09:17:19.2598993Z * [new tag] viable/strict/1763447206 -> viable/strict/1763447206 2025-12-04T09:17:19.2600641Z * [new tag] viable/strict/1763448826 -> viable/strict/1763448826 2025-12-04T09:17:19.2602283Z * [new tag] viable/strict/1763450717 -> viable/strict/1763450717 2025-12-04T09:17:19.2603981Z * [new tag] viable/strict/1763452183 -> viable/strict/1763452183 2025-12-04T09:17:19.2605625Z * [new tag] viable/strict/1763457945 -> viable/strict/1763457945 2025-12-04T09:17:19.2607257Z * [new tag] viable/strict/1763459439 -> viable/strict/1763459439 2025-12-04T09:17:19.2608817Z * [new tag] viable/strict/1763461556 -> viable/strict/1763461556 2025-12-04T09:17:19.2613080Z * [new tag] viable/strict/1763463103 -> viable/strict/1763463103 2025-12-04T09:17:19.2614738Z * [new tag] viable/strict/1763465100 -> viable/strict/1763465100 2025-12-04T09:17:19.2616292Z * [new tag] viable/strict/1763468866 -> viable/strict/1763468866 2025-12-04T09:17:19.2618190Z * [new tag] viable/strict/1763493823 -> viable/strict/1763493823 2025-12-04T09:17:19.2619794Z * [new tag] viable/strict/1763496249 -> viable/strict/1763496249 2025-12-04T09:17:19.2621408Z * [new tag] viable/strict/1763502620 -> viable/strict/1763502620 2025-12-04T09:17:19.2623052Z * [new tag] viable/strict/1763504715 -> viable/strict/1763504715 2025-12-04T09:17:19.2624655Z * [new tag] viable/strict/1763506208 -> viable/strict/1763506208 2025-12-04T09:17:19.2626384Z * [new tag] viable/strict/1763520590 -> viable/strict/1763520590 2025-12-04T09:17:19.2627963Z * [new tag] viable/strict/1763523357 -> viable/strict/1763523357 2025-12-04T09:17:19.2629628Z * [new tag] viable/strict/1763529922 -> viable/strict/1763529922 2025-12-04T09:17:19.2631336Z * [new tag] viable/strict/1763531408 -> viable/strict/1763531408 2025-12-04T09:17:19.2632917Z * [new tag] viable/strict/1763533622 -> viable/strict/1763533622 2025-12-04T09:17:19.2634542Z * [new tag] viable/strict/1763538576 -> viable/strict/1763538576 2025-12-04T09:17:19.2636207Z * [new tag] viable/strict/1763545823 -> viable/strict/1763545823 2025-12-04T09:17:19.2637685Z * [new tag] viable/strict/1763547951 -> viable/strict/1763547951 2025-12-04T09:17:19.2639385Z * [new tag] viable/strict/1763551477 -> viable/strict/1763551477 2025-12-04T09:17:19.2640952Z * [new tag] viable/strict/1763552982 -> viable/strict/1763552982 2025-12-04T09:17:19.2642582Z * [new tag] viable/strict/1763594698 -> viable/strict/1763594698 2025-12-04T09:17:19.2644184Z * [new tag] viable/strict/1763596178 -> viable/strict/1763596178 2025-12-04T09:17:19.2645798Z * [new tag] viable/strict/1763599155 -> viable/strict/1763599155 2025-12-04T09:17:19.2647365Z * [new tag] viable/strict/1763603717 -> viable/strict/1763603717 2025-12-04T09:17:19.2649021Z * [new tag] viable/strict/1763606923 -> viable/strict/1763606923 2025-12-04T09:17:19.2650639Z * [new tag] viable/strict/1763609715 -> viable/strict/1763609715 2025-12-04T09:17:19.2652243Z * [new tag] viable/strict/1763612757 -> viable/strict/1763612757 2025-12-04T09:17:19.2653816Z * [new tag] viable/strict/1763616325 -> viable/strict/1763616325 2025-12-04T09:17:19.2655427Z * [new tag] viable/strict/1763623509 -> viable/strict/1763623509 2025-12-04T09:17:19.2657141Z * [new tag] viable/strict/1763624984 -> viable/strict/1763624984 2025-12-04T09:17:19.2658900Z * [new tag] viable/strict/1763628796 -> viable/strict/1763628796 2025-12-04T09:17:19.2660454Z * [new tag] viable/strict/1763634343 -> viable/strict/1763634343 2025-12-04T09:17:19.2662015Z * [new tag] viable/strict/1763635867 -> viable/strict/1763635867 2025-12-04T09:17:19.2663806Z * [new tag] viable/strict/1763639382 -> viable/strict/1763639382 2025-12-04T09:17:19.2665348Z * [new tag] viable/strict/1763646626 -> viable/strict/1763646626 2025-12-04T09:17:19.2667174Z * [new tag] viable/strict/1763655997 -> viable/strict/1763655997 2025-12-04T09:17:19.2668892Z * [new tag] viable/strict/1763659444 -> viable/strict/1763659444 2025-12-04T09:17:19.2670437Z * [new tag] viable/strict/1763660992 -> viable/strict/1763660992 2025-12-04T09:17:19.2671983Z * [new tag] viable/strict/1763663201 -> viable/strict/1763663201 2025-12-04T09:17:19.2673667Z * [new tag] viable/strict/1763670362 -> viable/strict/1763670362 2025-12-04T09:17:19.2675051Z * [new tag] viable/strict/1763675378 -> viable/strict/1763675378 2025-12-04T09:17:19.2676694Z * [new tag] viable/strict/1763693343 -> viable/strict/1763693343 2025-12-04T09:17:19.2678259Z * [new tag] viable/strict/1763696088 -> viable/strict/1763696088 2025-12-04T09:17:19.2679960Z * [new tag] viable/strict/1763697343 -> viable/strict/1763697343 2025-12-04T09:17:19.2681582Z * [new tag] viable/strict/1763699165 -> viable/strict/1763699165 2025-12-04T09:17:19.2683179Z * [new tag] viable/strict/1763700660 -> viable/strict/1763700660 2025-12-04T09:17:19.2684732Z * [new tag] viable/strict/1763704209 -> viable/strict/1763704209 2025-12-04T09:17:19.2686349Z * [new tag] viable/strict/1763706411 -> viable/strict/1763706411 2025-12-04T09:17:19.2687928Z * [new tag] viable/strict/1763708082 -> viable/strict/1763708082 2025-12-04T09:17:19.2689441Z * [new tag] viable/strict/1763711381 -> viable/strict/1763711381 2025-12-04T09:17:19.2690957Z * [new tag] viable/strict/1763713593 -> viable/strict/1763713593 2025-12-04T09:17:19.2692815Z * [new tag] viable/strict/1763715201 -> viable/strict/1763715201 2025-12-04T09:17:19.2694436Z * [new tag] viable/strict/1763733017 -> viable/strict/1763733017 2025-12-04T09:17:19.2696090Z * [new tag] viable/strict/1763735108 -> viable/strict/1763735108 2025-12-04T09:17:19.2697668Z * [new tag] viable/strict/1763749579 -> viable/strict/1763749579 2025-12-04T09:17:19.2699316Z * [new tag] viable/strict/1763751113 -> viable/strict/1763751113 2025-12-04T09:17:19.2701013Z * [new tag] viable/strict/1763753035 -> viable/strict/1763753035 2025-12-04T09:17:19.2702732Z * [new tag] viable/strict/1763754578 -> viable/strict/1763754578 2025-12-04T09:17:19.2704279Z * [new tag] viable/strict/1763756748 -> viable/strict/1763756748 2025-12-04T09:17:19.2705835Z * [new tag] viable/strict/1763758205 -> viable/strict/1763758205 2025-12-04T09:17:19.2707285Z * [new tag] viable/strict/1763764050 -> viable/strict/1763764050 2025-12-04T09:17:19.2708920Z * [new tag] viable/strict/1763771887 -> viable/strict/1763771887 2025-12-04T09:17:19.2710786Z * [new tag] viable/strict/1763773920 -> viable/strict/1763773920 2025-12-04T09:17:19.2712334Z * [new tag] viable/strict/1763776501 -> viable/strict/1763776501 2025-12-04T09:17:19.2713901Z * [new tag] viable/strict/1763779437 -> viable/strict/1763779437 2025-12-04T09:17:19.2715751Z * [new tag] viable/strict/1763781038 -> viable/strict/1763781038 2025-12-04T09:17:19.2717365Z * [new tag] viable/strict/1763782245 -> viable/strict/1763782245 2025-12-04T09:17:19.2718820Z * [new tag] viable/strict/1763785568 -> viable/strict/1763785568 2025-12-04T09:17:19.2720564Z * [new tag] viable/strict/1763787006 -> viable/strict/1763787006 2025-12-04T09:17:19.2722739Z * [new tag] viable/strict/1763789103 -> viable/strict/1763789103 2025-12-04T09:17:19.2724242Z * [new tag] viable/strict/1763790578 -> viable/strict/1763790578 2025-12-04T09:17:19.2725842Z * [new tag] viable/strict/1763796275 -> viable/strict/1763796275 2025-12-04T09:17:19.2727690Z * [new tag] viable/strict/1763801465 -> viable/strict/1763801465 2025-12-04T09:17:19.2729338Z * [new tag] viable/strict/1763803522 -> viable/strict/1763803522 2025-12-04T09:17:19.2730861Z * [new tag] viable/strict/1763808581 -> viable/strict/1763808581 2025-12-04T09:17:19.2732453Z * [new tag] viable/strict/1763840977 -> viable/strict/1763840977 2025-12-04T09:17:19.2734041Z * [new tag] viable/strict/1763846659 -> viable/strict/1763846659 2025-12-04T09:17:19.2735670Z * [new tag] viable/strict/1763872065 -> viable/strict/1763872065 2025-12-04T09:17:19.2737297Z * [new tag] viable/strict/1763873648 -> viable/strict/1763873648 2025-12-04T09:17:19.2738896Z * [new tag] viable/strict/1763875506 -> viable/strict/1763875506 2025-12-04T09:17:19.2740502Z * [new tag] viable/strict/1763889904 -> viable/strict/1763889904 2025-12-04T09:17:19.2742112Z * [new tag] viable/strict/1763930999 -> viable/strict/1763930999 2025-12-04T09:17:19.2743761Z * [new tag] viable/strict/1763944964 -> viable/strict/1763944964 2025-12-04T09:17:19.2745217Z * [new tag] viable/strict/1763958474 -> viable/strict/1763958474 2025-12-04T09:17:19.2746836Z * [new tag] viable/strict/1763967263 -> viable/strict/1763967263 2025-12-04T09:17:19.2748488Z * [new tag] viable/strict/1763972803 -> viable/strict/1763972803 2025-12-04T09:17:19.2750052Z * [new tag] viable/strict/1763976376 -> viable/strict/1763976376 2025-12-04T09:17:19.2751641Z * [new tag] viable/strict/1763989404 -> viable/strict/1763989404 2025-12-04T09:17:19.2753225Z * [new tag] viable/strict/1763990887 -> viable/strict/1763990887 2025-12-04T09:17:19.2754817Z * [new tag] viable/strict/1764019919 -> viable/strict/1764019919 2025-12-04T09:17:19.2756534Z * [new tag] viable/strict/1764023134 -> viable/strict/1764023134 2025-12-04T09:17:19.2757958Z * [new tag] viable/strict/1764024593 -> viable/strict/1764024593 2025-12-04T09:17:19.2759565Z * [new tag] viable/strict/1764026706 -> viable/strict/1764026706 2025-12-04T09:17:19.2761415Z * [new tag] viable/strict/1764031139 -> viable/strict/1764031139 2025-12-04T09:17:19.2763027Z * [new tag] viable/strict/1764033131 -> viable/strict/1764033131 2025-12-04T09:17:19.2764460Z * [new tag] viable/strict/1764035725 -> viable/strict/1764035725 2025-12-04T09:17:19.2765913Z * [new tag] viable/strict/1764624265 -> viable/strict/1764624265 2025-12-04T09:17:19.2767346Z * [new tag] viable/strict/1764631514 -> viable/strict/1764631514 2025-12-04T09:17:19.2768772Z * [new tag] viable/strict/1764632987 -> viable/strict/1764632987 2025-12-04T09:17:19.2770195Z * [new tag] viable/strict/1764636063 -> viable/strict/1764636063 2025-12-04T09:17:19.2771766Z * [new tag] viable/strict/1764643975 -> viable/strict/1764643975 2025-12-04T09:17:19.2773190Z * [new tag] viable/strict/1764646859 -> viable/strict/1764646859 2025-12-04T09:17:19.2774724Z * [new tag] viable/strict/1764653120 -> viable/strict/1764653120 2025-12-04T09:17:19.2776038Z * [new tag] viable/strict/1764654632 -> viable/strict/1764654632 2025-12-04T09:17:19.2777462Z * [new tag] viable/strict/1764656821 -> viable/strict/1764656821 2025-12-04T09:17:19.2778910Z * [new tag] viable/strict/1764658557 -> viable/strict/1764658557 2025-12-04T09:17:19.2780410Z * [new tag] viable/strict/1764660333 -> viable/strict/1764660333 2025-12-04T09:17:19.2781836Z * [new tag] viable/strict/1764661812 -> viable/strict/1764661812 2025-12-04T09:17:19.2783286Z * [new tag] viable/strict/1764664023 -> viable/strict/1764664023 2025-12-04T09:17:19.2784676Z * [new tag] viable/strict/1764669150 -> viable/strict/1764669150 2025-12-04T09:17:19.2786193Z * [new tag] viable/strict/1764680709 -> viable/strict/1764680709 2025-12-04T09:17:19.2787597Z * [new tag] viable/strict/1764687619 -> viable/strict/1764687619 2025-12-04T09:17:19.2789087Z * [new tag] viable/strict/1764696355 -> viable/strict/1764696355 2025-12-04T09:17:19.2790474Z * [new tag] viable/strict/1764701767 -> viable/strict/1764701767 2025-12-04T09:17:19.2791903Z * [new tag] viable/strict/1764710768 -> viable/strict/1764710768 2025-12-04T09:17:19.2793535Z * [new tag] viable/strict/1764716202 -> viable/strict/1764716202 2025-12-04T09:17:19.2795043Z * [new tag] viable/strict/1764793566 -> viable/strict/1764793566 2025-12-04T09:17:19.2796464Z * [new tag] viable/strict/1764797093 -> viable/strict/1764797093 2025-12-04T09:17:19.2797906Z * [new tag] viable/strict/1764800729 -> viable/strict/1764800729 2025-12-04T09:17:19.2799436Z * [new tag] whc_flight_1 -> whc_flight_1 2025-12-04T09:17:19.2800952Z * [new tag] whc_flight_2 -> whc_flight_2 2025-12-04T09:17:19.2802649Z * [new tag] whc_flight_4 -> whc_flight_4 2025-12-04T09:17:19.3987953Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T09:17:19.4020888Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:17:19.4026272Z ##[endgroup] 2025-12-04T09:17:19.4028164Z ##[group]Determining the checkout info 2025-12-04T09:17:19.4028604Z ##[endgroup] 2025-12-04T09:17:19.4032370Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T09:17:19.4076223Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T09:17:19.4113910Z ##[group]Checking out the ref 2025-12-04T09:17:19.4117060Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:17:20.4421649Z Updating files: 65% (13152/20121) 2025-12-04T09:17:20.4504366Z Updating files: 66% (13280/20121) 2025-12-04T09:17:20.4588111Z Updating files: 67% (13482/20121) 2025-12-04T09:17:20.4670370Z Updating files: 68% (13683/20121) 2025-12-04T09:17:20.4885645Z Updating files: 69% (13884/20121) 2025-12-04T09:17:20.5214487Z Updating files: 70% (14085/20121) 2025-12-04T09:17:20.5284690Z Updating files: 71% (14286/20121) 2025-12-04T09:17:20.5378381Z Updating files: 72% (14488/20121) 2025-12-04T09:17:20.5599628Z Updating files: 73% (14689/20121) 2025-12-04T09:17:20.5877753Z Updating files: 74% (14890/20121) 2025-12-04T09:17:20.6426132Z Updating files: 75% (15091/20121) 2025-12-04T09:17:20.6606363Z Updating files: 76% (15292/20121) 2025-12-04T09:17:20.6773579Z Updating files: 77% (15494/20121) 2025-12-04T09:17:20.7015687Z Updating files: 78% (15695/20121) 2025-12-04T09:17:20.7313800Z Updating files: 79% (15896/20121) 2025-12-04T09:17:20.7672526Z Updating files: 80% (16097/20121) 2025-12-04T09:17:20.7997629Z Updating files: 81% (16299/20121) 2025-12-04T09:17:20.8257952Z Updating files: 82% (16500/20121) 2025-12-04T09:17:20.8449324Z Updating files: 83% (16701/20121) 2025-12-04T09:17:20.8627408Z Updating files: 84% (16902/20121) 2025-12-04T09:17:20.8827040Z Updating files: 85% (17103/20121) 2025-12-04T09:17:20.9021215Z Updating files: 86% (17305/20121) 2025-12-04T09:17:20.9202423Z Updating files: 87% (17506/20121) 2025-12-04T09:17:20.9352950Z Updating files: 88% (17707/20121) 2025-12-04T09:17:20.9528662Z Updating files: 89% (17908/20121) 2025-12-04T09:17:20.9738791Z Updating files: 90% (18109/20121) 2025-12-04T09:17:20.9890545Z Updating files: 91% (18311/20121) 2025-12-04T09:17:21.0085587Z Updating files: 92% (18512/20121) 2025-12-04T09:17:21.0315837Z Updating files: 93% (18713/20121) 2025-12-04T09:17:21.0559317Z Updating files: 94% (18914/20121) 2025-12-04T09:17:21.0771075Z Updating files: 95% (19115/20121) 2025-12-04T09:17:21.0969249Z Updating files: 96% (19317/20121) 2025-12-04T09:17:21.1171771Z Updating files: 97% (19518/20121) 2025-12-04T09:17:21.1497208Z Updating files: 98% (19719/20121) 2025-12-04T09:17:21.1711534Z Updating files: 99% (19920/20121) 2025-12-04T09:17:21.1712322Z Updating files: 100% (20121/20121) 2025-12-04T09:17:21.1712889Z Updating files: 100% (20121/20121), done. 2025-12-04T09:17:21.1997896Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'. 2025-12-04T09:17:21.1998339Z 2025-12-04T09:17:21.1998636Z You are in 'detached HEAD' state. You can look around, make experimental 2025-12-04T09:17:21.1999326Z changes and commit them, and you can discard any commits you make in this 2025-12-04T09:17:21.1999866Z state without impacting any branches by switching back to a branch. 2025-12-04T09:17:21.2000195Z 2025-12-04T09:17:21.2000396Z If you want to create a new branch to retain commits you create, you may 2025-12-04T09:17:21.2000896Z do so (now or later) by using -c with the switch command. Example: 2025-12-04T09:17:21.2001181Z 2025-12-04T09:17:21.2001298Z git switch -c 2025-12-04T09:17:21.2001492Z 2025-12-04T09:17:21.2001598Z Or undo this operation with: 2025-12-04T09:17:21.2001786Z 2025-12-04T09:17:21.2001870Z git switch - 2025-12-04T09:17:21.2001995Z 2025-12-04T09:17:21.2002232Z Turn off this advice by setting config variable advice.detachedHead to false 2025-12-04T09:17:21.2002578Z 2025-12-04T09:17:21.2003583Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T09:17:21.2186787Z ##[endgroup] 2025-12-04T09:17:21.2187305Z ##[group]Setting up auth for fetching submodules 2025-12-04T09:17:21.2195703Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T09:17:21.2255054Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T09:17:21.2291022Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T09:17:21.2325202Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T09:17:21.2359179Z ##[endgroup] 2025-12-04T09:17:21.2359564Z ##[group]Fetching submodules 2025-12-04T09:17:21.2362030Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T09:17:21.2775899Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T09:17:21.3178690Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-12-04T09:17:21.3181120Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-12-04T09:17:21.3184836Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-12-04T09:17:21.3188543Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-12-04T09:17:21.3192340Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-12-04T09:17:21.3197032Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-12-04T09:17:21.3200474Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-12-04T09:17:21.3204696Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-12-04T09:17:21.3209605Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-12-04T09:17:21.3214049Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-12-04T09:17:21.3218332Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-12-04T09:17:21.3223549Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-12-04T09:17:21.3228025Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-12-04T09:17:21.3232498Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-12-04T09:17:21.3237130Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-12-04T09:17:21.3241993Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-12-04T09:17:21.3250134Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-12-04T09:17:21.3254940Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-12-04T09:17:21.3260081Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:17:21.3264945Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-12-04T09:17:21.3270112Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-12-04T09:17:21.3275242Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-12-04T09:17:21.3283588Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-12-04T09:17:21.3286991Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-12-04T09:17:21.3292565Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-12-04T09:17:21.3298365Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-12-04T09:17:21.3304871Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-12-04T09:17:21.3311246Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-12-04T09:17:21.3317542Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-12-04T09:17:21.3323341Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-12-04T09:17:21.3329499Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-12-04T09:17:21.3335762Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-12-04T09:17:21.3343253Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-12-04T09:17:21.3352574Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-12-04T09:17:21.3358806Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-12-04T09:17:21.3365312Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-12-04T09:17:21.3372073Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-12-04T09:17:21.3413445Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2025-12-04T09:17:21.5842894Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2025-12-04T09:17:21.5843665Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2025-12-04T09:17:21.5844353Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2025-12-04T09:17:21.5877493Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2025-12-04T09:17:24.5513905Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2025-12-04T09:17:24.5515287Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'... 2025-12-04T09:17:24.5516725Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2025-12-04T09:17:24.5517859Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-12-04T09:17:24.5519125Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-12-04T09:17:24.5520436Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'... 2025-12-04T09:17:24.5521852Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-12-04T09:17:24.5523150Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2025-12-04T09:17:24.5524295Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2025-12-04T09:17:24.5525589Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'... 2025-12-04T09:17:24.5527403Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2025-12-04T09:17:24.5528619Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-12-04T09:17:24.5529867Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2025-12-04T09:17:24.5530927Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2025-12-04T09:17:24.5532274Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'... 2025-12-04T09:17:24.5533601Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-12-04T09:17:24.5746177Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-12-04T09:17:24.7118951Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-12-04T09:17:24.8135298Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2025-12-04T09:17:24.9009903Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-12-04T09:17:24.9937608Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2025-12-04T09:17:27.4113030Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-12-04T09:17:27.4114446Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2025-12-04T09:17:27.4116001Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2025-12-04T09:17:27.4117804Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2025-12-04T09:17:27.4120624Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2025-12-04T09:17:27.5114343Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-12-04T09:17:45.6006170Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-12-04T09:17:45.6008722Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2025-12-04T09:17:45.6010788Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-12-04T09:17:45.6012756Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'... 2025-12-04T09:17:45.6013740Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2025-12-04T09:17:45.6226935Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T09:17:45.6400220Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T09:17:45.6540943Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T09:17:45.6894536Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T09:17:45.7964347Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T09:17:45.8628159Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T09:17:46.8699754Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T09:17:47.0969844Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T09:17:47.0998298Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:17:47.1033984Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-12-04T09:17:52.4427811Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T09:17:52.4754401Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T09:17:52.9519214Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T09:17:53.0153184Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T09:17:53.1341373Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T09:17:53.1957918Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T09:17:54.0361602Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T09:17:54.2388768Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T09:17:54.2419812Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-12-04T09:17:54.2422821Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:17:54.2426631Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:17:54.2430830Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-12-04T09:17:54.2435239Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-12-04T09:17:54.2439450Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:17:54.2443253Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-12-04T09:17:54.2478589Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-12-04T09:17:55.4861405Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-12-04T09:17:55.4862413Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-12-04T09:17:55.4863698Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-12-04T09:17:55.5863177Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-12-04T09:17:59.2034016Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-12-04T09:17:59.3035466Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-12-04T09:18:02.2206475Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T09:18:02.6946540Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T09:18:02.8163612Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T09:18:03.6352233Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T09:18:03.6917572Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:18:03.7079780Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T09:18:03.8426534Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T09:18:03.9381766Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T09:18:03.9408672Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:18:03.9411655Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:18:03.9448609Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-12-04T09:18:08.7141094Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-12-04T09:18:09.0518540Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T09:18:09.7785233Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T09:18:09.9663648Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T09:18:10.0042793Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T09:18:10.0528400Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T09:18:10.0884465Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T09:18:10.1445469Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:18:10.1626547Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T09:18:10.1651294Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-12-04T09:18:10.1685603Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-12-04T09:18:28.2784813Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T09:18:28.3063562Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T09:18:28.4097182Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T09:18:28.4124998Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:18:28.4128309Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:18:28.4132325Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:18:28.4168690Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-12-04T09:18:29.1555059Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-12-04T09:18:29.7559748Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-12-04T09:18:29.8660365Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T09:18:29.8684313Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:18:29.8688332Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:18:29.8692375Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:18:29.8696581Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:18:29.8700879Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:18:29.8705354Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:18:29.8710179Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:18:29.8714527Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:18:29.8719236Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:18:29.8753826Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-12-04T09:18:31.8046819Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-12-04T09:18:31.8048232Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'... 2025-12-04T09:18:31.8049840Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-12-04T09:18:31.8051123Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-12-04T09:18:31.8052633Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-12-04T09:18:31.8053989Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-12-04T09:18:31.8055319Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-12-04T09:18:31.9047801Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-12-04T09:18:37.8830066Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T09:18:37.9085425Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T09:18:37.9547686Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T09:18:37.9739826Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T09:18:37.9762873Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:18:37.9797596Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-12-04T09:18:38.2686040Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T09:18:38.2938505Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T09:18:38.3504032Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:18:38.4795156Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T09:18:38.5021197Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T09:18:38.5265343Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T09:18:38.5288857Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:18:38.5292104Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:18:38.5328313Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T09:18:40.7925230Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T09:18:41.0864212Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T09:18:41.1443246Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T09:18:41.1855486Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T09:18:41.2423729Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:18:41.3144499Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T09:18:41.3658298Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T09:18:41.5013887Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T09:18:42.1317714Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T09:18:42.1361614Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-12-04T09:18:42.1397339Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-12-04T09:18:43.0646174Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T09:18:43.1644751Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T09:18:43.1673537Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:18:43.1676465Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:18:43.1680291Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:18:43.1684440Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:18:43.1688659Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:18:43.1692726Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:18:43.1696957Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:18:43.1701218Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:18:43.1736643Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-12-04T09:18:43.6221425Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-12-04T09:18:43.6223141Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-12-04T09:18:43.6224752Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-12-04T09:18:43.6226236Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-12-04T09:18:43.7223052Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-12-04T09:18:44.4338067Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-12-04T09:18:51.5171887Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-12-04T09:18:52.2572918Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T09:18:52.3082836Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T09:18:52.3311881Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T09:18:52.4663713Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T09:18:52.4852212Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T09:18:52.5064776Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T09:18:52.5293726Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T09:18:52.5319217Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:18:52.5321821Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:18:52.5355739Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T09:18:54.7770210Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T09:18:55.0695181Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T09:18:55.1270687Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T09:18:55.8581130Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T09:18:55.8746388Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T09:18:56.2144877Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T09:18:56.2175579Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:18:56.2178365Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-12-04T09:18:56.2213100Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-12-04T09:18:56.7713700Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-12-04T09:18:57.2251570Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T09:18:57.3126600Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T09:18:57.3264142Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T09:18:57.3435073Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T09:18:57.3997459Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T09:18:57.4370135Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T09:18:57.4916511Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T09:18:57.5308091Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T09:18:57.5336957Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:18:57.5338389Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:18:57.5341242Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:18:57.5345174Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:18:57.5381688Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-12-04T09:18:58.7351989Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-12-04T09:18:58.7353029Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-12-04T09:18:58.7648530Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-12-04T09:18:58.8339838Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T09:18:58.8562253Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T09:18:58.9464575Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T09:18:58.9851516Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T09:18:58.9874614Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:18:58.9909528Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-12-04T09:18:59.1809651Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T09:18:59.1861435Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T09:18:59.2263686Z Entering 'android/libs/fbjni' 2025-12-04T09:18:59.2325717Z Entering 'third_party/FP16' 2025-12-04T09:18:59.2391081Z Entering 'third_party/FXdiv' 2025-12-04T09:18:59.2453331Z Entering 'third_party/NNPACK' 2025-12-04T09:18:59.2512765Z Entering 'third_party/NVTX' 2025-12-04T09:18:59.2573595Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:18:59.2634255Z Entering 'third_party/XNNPACK' 2025-12-04T09:18:59.2709884Z Entering 'third_party/aiter' 2025-12-04T09:18:59.2771834Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:18:59.2842803Z Entering 'third_party/benchmark' 2025-12-04T09:18:59.2903795Z Entering 'third_party/composable_kernel' 2025-12-04T09:18:59.2972698Z Entering 'third_party/cpp-httplib' 2025-12-04T09:18:59.3032700Z Entering 'third_party/cpuinfo' 2025-12-04T09:18:59.3094052Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:18:59.3153659Z Entering 'third_party/cutlass' 2025-12-04T09:18:59.3223410Z Entering 'third_party/fbgemm' 2025-12-04T09:18:59.3282792Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:18:59.3340804Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:18:59.3410840Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:18:59.3471362Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:18:59.3538081Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:18:59.3594871Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:18:59.3651096Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:18:59.3713871Z Entering 'third_party/flash-attention' 2025-12-04T09:18:59.3772413Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:18:59.3835959Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:18:59.3904142Z Entering 'third_party/flatbuffers' 2025-12-04T09:18:59.3965923Z Entering 'third_party/fmt' 2025-12-04T09:18:59.4032597Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:18:59.4091443Z Entering 'third_party/gloo' 2025-12-04T09:18:59.4152441Z Entering 'third_party/googletest' 2025-12-04T09:18:59.4213106Z Entering 'third_party/ideep' 2025-12-04T09:18:59.4269761Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:18:59.4336699Z Entering 'third_party/ittapi' 2025-12-04T09:18:59.4399577Z Entering 'third_party/kineto' 2025-12-04T09:18:59.4461791Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:18:59.4518021Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:18:59.4576822Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:18:59.4636837Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:18:59.4695125Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:18:59.4760674Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:18:59.4822886Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:18:59.4880439Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:18:59.4939340Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:18:59.5004337Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:18:59.5061823Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:18:59.5119048Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:18:59.5180867Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:18:59.5248240Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:18:59.5311168Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:18:59.5373380Z Entering 'third_party/kleidiai' 2025-12-04T09:18:59.5435197Z Entering 'third_party/mimalloc' 2025-12-04T09:18:59.5494813Z Entering 'third_party/nlohmann' 2025-12-04T09:18:59.5557303Z Entering 'third_party/onnx' 2025-12-04T09:18:59.5635142Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:18:59.5698676Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:18:59.5761425Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:18:59.5819797Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:18:59.5875860Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:18:59.5932564Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:18:59.5991141Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:18:59.6047688Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:18:59.6109828Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:18:59.6164658Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:18:59.6225815Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:18:59.6290317Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:18:59.6372120Z Entering 'third_party/pocketfft' 2025-12-04T09:18:59.6434895Z Entering 'third_party/protobuf' 2025-12-04T09:18:59.6496634Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:18:59.6555749Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:18:59.6618028Z Entering 'third_party/psimd' 2025-12-04T09:18:59.6678736Z Entering 'third_party/pthreadpool' 2025-12-04T09:18:59.6738427Z Entering 'third_party/pybind11' 2025-12-04T09:18:59.6798761Z Entering 'third_party/python-peachpy' 2025-12-04T09:18:59.6859217Z Entering 'third_party/sleef' 2025-12-04T09:18:59.6919020Z Entering 'third_party/tensorpipe' 2025-12-04T09:18:59.6978790Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:18:59.7036642Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:18:59.7093333Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:18:59.7152019Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:18:59.7207632Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:18:59.7291464Z ##[endgroup] 2025-12-04T09:18:59.7292119Z ##[group]Persisting credentials for submodules 2025-12-04T09:18:59.7297677Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T09:18:59.7690229Z Entering 'android/libs/fbjni' 2025-12-04T09:18:59.7769331Z Entering 'third_party/FP16' 2025-12-04T09:18:59.7850377Z Entering 'third_party/FXdiv' 2025-12-04T09:18:59.7928954Z Entering 'third_party/NNPACK' 2025-12-04T09:18:59.8007482Z Entering 'third_party/NVTX' 2025-12-04T09:18:59.8089019Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:18:59.8168347Z Entering 'third_party/XNNPACK' 2025-12-04T09:18:59.8262015Z Entering 'third_party/aiter' 2025-12-04T09:18:59.8342396Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:18:59.8431800Z Entering 'third_party/benchmark' 2025-12-04T09:18:59.8512447Z Entering 'third_party/composable_kernel' 2025-12-04T09:18:59.8602503Z Entering 'third_party/cpp-httplib' 2025-12-04T09:18:59.8682100Z Entering 'third_party/cpuinfo' 2025-12-04T09:18:59.8761925Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:18:59.8841156Z Entering 'third_party/cutlass' 2025-12-04T09:18:59.8931329Z Entering 'third_party/fbgemm' 2025-12-04T09:18:59.9015808Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:18:59.9091495Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:18:59.9177453Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:18:59.9253865Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:18:59.9344272Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:18:59.9422159Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:18:59.9497093Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:18:59.9579192Z Entering 'third_party/flash-attention' 2025-12-04T09:18:59.9658709Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:18:59.9743985Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:18:59.9830659Z Entering 'third_party/flatbuffers' 2025-12-04T09:18:59.9914156Z Entering 'third_party/fmt' 2025-12-04T09:18:59.9992603Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:19:00.0073564Z Entering 'third_party/gloo' 2025-12-04T09:19:00.0153280Z Entering 'third_party/googletest' 2025-12-04T09:19:00.0232764Z Entering 'third_party/ideep' 2025-12-04T09:19:00.0310434Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:19:00.0396439Z Entering 'third_party/ittapi' 2025-12-04T09:19:00.0475312Z Entering 'third_party/kineto' 2025-12-04T09:19:00.0555144Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:19:00.0632721Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:19:00.0711644Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:19:00.0789065Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:19:00.0867001Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:19:00.0944604Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:19:00.1025400Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:19:00.1102126Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:19:00.1183588Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:19:00.1262612Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:19:00.1345144Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:19:00.1422433Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:00.1506281Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:00.1591394Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:19:00.1667708Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:19:00.1746601Z Entering 'third_party/kleidiai' 2025-12-04T09:19:00.1828666Z Entering 'third_party/mimalloc' 2025-12-04T09:19:00.1909015Z Entering 'third_party/nlohmann' 2025-12-04T09:19:00.1989590Z Entering 'third_party/onnx' 2025-12-04T09:19:00.2084578Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:19:00.2171282Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:19:00.2251378Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:19:00.2328510Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:19:00.2411934Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:19:00.2492945Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:19:00.2574271Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:19:00.2649701Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:19:00.2726289Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:19:00.2800734Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:00.2879894Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:00.2958913Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:19:00.3058015Z Entering 'third_party/pocketfft' 2025-12-04T09:19:00.3137583Z Entering 'third_party/protobuf' 2025-12-04T09:19:00.3220272Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:19:00.3298688Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:19:00.3378765Z Entering 'third_party/psimd' 2025-12-04T09:19:00.3457121Z Entering 'third_party/pthreadpool' 2025-12-04T09:19:00.3535844Z Entering 'third_party/pybind11' 2025-12-04T09:19:00.3615203Z Entering 'third_party/python-peachpy' 2025-12-04T09:19:00.3693756Z Entering 'third_party/sleef' 2025-12-04T09:19:00.3773852Z Entering 'third_party/tensorpipe' 2025-12-04T09:19:00.3853866Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:19:00.3935043Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:19:00.4014422Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:19:00.4089663Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:19:00.4170918Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:19:00.4279386Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T09:19:00.4676802Z Entering 'android/libs/fbjni' 2025-12-04T09:19:00.4750373Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T09:19:00.4777067Z Entering 'third_party/FP16' 2025-12-04T09:19:00.4848869Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T09:19:00.4875945Z Entering 'third_party/FXdiv' 2025-12-04T09:19:00.4963938Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T09:19:00.4992000Z Entering 'third_party/NNPACK' 2025-12-04T09:19:00.5067234Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T09:19:00.5093905Z Entering 'third_party/NVTX' 2025-12-04T09:19:00.5168074Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T09:19:00.5193368Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:19:00.5265797Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T09:19:00.5289356Z Entering 'third_party/XNNPACK' 2025-12-04T09:19:00.5365535Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T09:19:00.5404978Z Entering 'third_party/aiter' 2025-12-04T09:19:00.5485677Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T09:19:00.5512875Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:19:00.5584982Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T09:19:00.5619620Z Entering 'third_party/benchmark' 2025-12-04T09:19:00.5688655Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:19:00.5714518Z Entering 'third_party/composable_kernel' 2025-12-04T09:19:00.5792871Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T09:19:00.5827096Z Entering 'third_party/cpp-httplib' 2025-12-04T09:19:00.5899819Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T09:19:00.5925773Z Entering 'third_party/cpuinfo' 2025-12-04T09:19:00.5998159Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T09:19:00.6026073Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:19:00.6099430Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T09:19:00.6124112Z Entering 'third_party/cutlass' 2025-12-04T09:19:00.6192834Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T09:19:00.6232104Z Entering 'third_party/fbgemm' 2025-12-04T09:19:00.6303157Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T09:19:00.6331037Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:19:00.6401117Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T09:19:00.6426670Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:19:00.6500576Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T09:19:00.6532744Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:19:00.6604633Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T09:19:00.6629517Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:19:00.6697563Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T09:19:00.6732105Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:19:00.6803520Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T09:19:00.6831973Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:19:00.6903143Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T09:19:00.6926744Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:19:00.7002287Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T09:19:00.7033518Z Entering 'third_party/flash-attention' 2025-12-04T09:19:00.7103625Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T09:19:00.7128379Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:19:00.7205035Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T09:19:00.7239572Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:19:00.7312167Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T09:19:00.7347118Z Entering 'third_party/flatbuffers' 2025-12-04T09:19:00.7420893Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T09:19:00.7449470Z Entering 'third_party/fmt' 2025-12-04T09:19:00.7525315Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T09:19:00.7551114Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:19:00.7624582Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T09:19:00.7649085Z Entering 'third_party/gloo' 2025-12-04T09:19:00.7724126Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T09:19:00.7750261Z Entering 'third_party/googletest' 2025-12-04T09:19:00.7831025Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:19:00.7855779Z Entering 'third_party/ideep' 2025-12-04T09:19:00.7933928Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T09:19:00.7959476Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:19:00.8031234Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T09:19:00.8065909Z Entering 'third_party/ittapi' 2025-12-04T09:19:00.8136591Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T09:19:00.8162923Z Entering 'third_party/kineto' 2025-12-04T09:19:00.8237020Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T09:19:00.8261507Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:19:00.8335021Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T09:19:00.8357030Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:19:00.8439350Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T09:19:00.8465884Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:19:00.8549383Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T09:19:00.8574111Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:19:00.8646586Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T09:19:00.8670671Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:19:00.8742485Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T09:19:00.8764185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:19:00.8835367Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T09:19:00.8863574Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:19:00.8935909Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T09:19:00.8959892Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:19:00.9033886Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:19:00.9058960Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:19:00.9131030Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T09:19:00.9155775Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:19:00.9228505Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T09:19:00.9254383Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:19:00.9328131Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T09:19:00.9351445Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:00.9424388Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T09:19:00.9451258Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:00.9524530Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T09:19:00.9559127Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:19:00.9630289Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T09:19:00.9653492Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:19:00.9723639Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T09:19:00.9752366Z Entering 'third_party/kleidiai' 2025-12-04T09:19:00.9824640Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T09:19:00.9850468Z Entering 'third_party/mimalloc' 2025-12-04T09:19:00.9926244Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T09:19:00.9951676Z Entering 'third_party/nlohmann' 2025-12-04T09:19:01.0026624Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T09:19:01.0052363Z Entering 'third_party/onnx' 2025-12-04T09:19:01.0125124Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T09:19:01.0166404Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:19:01.0238539Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:19:01.0268181Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:19:01.0340627Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T09:19:01.0366105Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:19:01.0435453Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:19:01.0458651Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:19:01.0532324Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:19:01.0556018Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:19:01.0627692Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T09:19:01.0651346Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:19:01.0727496Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T09:19:01.0754487Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:19:01.0828161Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T09:19:01.0850958Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:19:01.0926496Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T09:19:01.0950506Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:19:01.1029535Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T09:19:01.1043132Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:01.1116355Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T09:19:01.1142788Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:01.1216519Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T09:19:01.1243474Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:19:01.1318300Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T09:19:01.1365625Z Entering 'third_party/pocketfft' 2025-12-04T09:19:01.1438654Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T09:19:01.1461792Z Entering 'third_party/protobuf' 2025-12-04T09:19:01.1533786Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T09:19:01.1560859Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:19:01.1633536Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:19:01.1657906Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:19:01.1732131Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:19:01.1760057Z Entering 'third_party/psimd' 2025-12-04T09:19:01.1830682Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T09:19:01.1856578Z Entering 'third_party/pthreadpool' 2025-12-04T09:19:01.1925966Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T09:19:01.1951380Z Entering 'third_party/pybind11' 2025-12-04T09:19:01.2022038Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:19:01.2047601Z Entering 'third_party/python-peachpy' 2025-12-04T09:19:01.2120430Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T09:19:01.2145413Z Entering 'third_party/sleef' 2025-12-04T09:19:01.2218915Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T09:19:01.2243439Z Entering 'third_party/tensorpipe' 2025-12-04T09:19:01.2314371Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T09:19:01.2337926Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:19:01.2407492Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:19:01.2433584Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:19:01.2503592Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T09:19:01.2527053Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:19:01.2596914Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T09:19:01.2620015Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:19:01.2689366Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:19:01.2710910Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:19:01.2784440Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T09:19:01.3971496Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T09:19:01.4369592Z Entering 'android/libs/fbjni' 2025-12-04T09:19:01.4429854Z Entering 'third_party/FP16' 2025-12-04T09:19:01.4490368Z Entering 'third_party/FXdiv' 2025-12-04T09:19:01.4554428Z Entering 'third_party/NNPACK' 2025-12-04T09:19:01.4615654Z Entering 'third_party/NVTX' 2025-12-04T09:19:01.4676990Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:19:01.4736946Z Entering 'third_party/XNNPACK' 2025-12-04T09:19:01.4819142Z Entering 'third_party/aiter' 2025-12-04T09:19:01.4878326Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:19:01.4948106Z Entering 'third_party/benchmark' 2025-12-04T09:19:01.5008826Z Entering 'third_party/composable_kernel' 2025-12-04T09:19:01.5077743Z Entering 'third_party/cpp-httplib' 2025-12-04T09:19:01.5138263Z Entering 'third_party/cpuinfo' 2025-12-04T09:19:01.5199359Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:19:01.5261025Z Entering 'third_party/cutlass' 2025-12-04T09:19:01.5333067Z Entering 'third_party/fbgemm' 2025-12-04T09:19:01.5396986Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:19:01.5454214Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:19:01.5519441Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:19:01.5577757Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:19:01.5643725Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:19:01.5700895Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:19:01.5761623Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:19:01.5825114Z Entering 'third_party/flash-attention' 2025-12-04T09:19:01.5884591Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:19:01.5948060Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:19:01.6017848Z Entering 'third_party/flatbuffers' 2025-12-04T09:19:01.6080985Z Entering 'third_party/fmt' 2025-12-04T09:19:01.6140333Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:19:01.6203094Z Entering 'third_party/gloo' 2025-12-04T09:19:01.6263601Z Entering 'third_party/googletest' 2025-12-04T09:19:01.6324367Z Entering 'third_party/ideep' 2025-12-04T09:19:01.6383512Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:19:01.6450947Z Entering 'third_party/ittapi' 2025-12-04T09:19:01.6511288Z Entering 'third_party/kineto' 2025-12-04T09:19:01.6569795Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:19:01.6632313Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:19:01.6692548Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:19:01.6750253Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:19:01.6808979Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:19:01.6865329Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:19:01.6928130Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:19:01.6985083Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:19:01.7043885Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:19:01.7102232Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:19:01.7161130Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:19:01.7218569Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:01.7280576Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:01.7348541Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:19:01.7405318Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:19:01.7469023Z Entering 'third_party/kleidiai' 2025-12-04T09:19:01.7530322Z Entering 'third_party/mimalloc' 2025-12-04T09:19:01.7590391Z Entering 'third_party/nlohmann' 2025-12-04T09:19:01.7654987Z Entering 'third_party/onnx' 2025-12-04T09:19:01.7729744Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:19:01.7794357Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:19:01.7857188Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:19:01.7915506Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:19:01.7972882Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:19:01.8029442Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:19:01.8087254Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:19:01.8145140Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:19:01.8202909Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:19:01.8259570Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:01.8318694Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:01.8378770Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:19:01.8458362Z Entering 'third_party/pocketfft' 2025-12-04T09:19:01.8518650Z Entering 'third_party/protobuf' 2025-12-04T09:19:01.8580987Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:19:01.8639802Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:19:01.8700853Z Entering 'third_party/psimd' 2025-12-04T09:19:01.8762177Z Entering 'third_party/pthreadpool' 2025-12-04T09:19:01.8823567Z Entering 'third_party/pybind11' 2025-12-04T09:19:01.8883983Z Entering 'third_party/python-peachpy' 2025-12-04T09:19:01.8944900Z Entering 'third_party/sleef' 2025-12-04T09:19:01.9004608Z Entering 'third_party/tensorpipe' 2025-12-04T09:19:01.9068839Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:19:01.9125552Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:19:01.9184955Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:19:01.9242379Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:19:01.9301399Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:19:01.9390995Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T09:19:01.9788157Z Entering 'android/libs/fbjni' 2025-12-04T09:19:01.9849774Z Entering 'third_party/FP16' 2025-12-04T09:19:01.9912026Z Entering 'third_party/FXdiv' 2025-12-04T09:19:01.9977154Z Entering 'third_party/NNPACK' 2025-12-04T09:19:02.0038496Z Entering 'third_party/NVTX' 2025-12-04T09:19:02.0099374Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:19:02.0160767Z Entering 'third_party/XNNPACK' 2025-12-04T09:19:02.0238148Z Entering 'third_party/aiter' 2025-12-04T09:19:02.0300824Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:19:02.0369821Z Entering 'third_party/benchmark' 2025-12-04T09:19:02.0433169Z Entering 'third_party/composable_kernel' 2025-12-04T09:19:02.0502412Z Entering 'third_party/cpp-httplib' 2025-12-04T09:19:02.0564810Z Entering 'third_party/cpuinfo' 2025-12-04T09:19:02.0626451Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:19:02.0687284Z Entering 'third_party/cutlass' 2025-12-04T09:19:02.0758360Z Entering 'third_party/fbgemm' 2025-12-04T09:19:02.0821555Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:19:02.0879042Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:19:02.0945776Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:19:02.1002037Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:19:02.1070066Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:19:02.1128372Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:19:02.1184033Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:19:02.1246039Z Entering 'third_party/flash-attention' 2025-12-04T09:19:02.1305104Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:19:02.1371795Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:19:02.1440620Z Entering 'third_party/flatbuffers' 2025-12-04T09:19:02.1506500Z Entering 'third_party/fmt' 2025-12-04T09:19:02.1567517Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:19:02.1635448Z Entering 'third_party/gloo' 2025-12-04T09:19:02.1695832Z Entering 'third_party/googletest' 2025-12-04T09:19:02.1756845Z Entering 'third_party/ideep' 2025-12-04T09:19:02.1815441Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:19:02.1884760Z Entering 'third_party/ittapi' 2025-12-04T09:19:02.1945656Z Entering 'third_party/kineto' 2025-12-04T09:19:02.2004545Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:19:02.2061577Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:19:02.2123229Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:19:02.2180736Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:19:02.2239971Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:19:02.2295655Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:19:02.2358996Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:19:02.2416250Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:19:02.2475803Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:19:02.2535795Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:19:02.2595010Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:19:02.2662264Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:02.2721322Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:02.2789509Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:19:02.2851678Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:19:02.2913352Z Entering 'third_party/kleidiai' 2025-12-04T09:19:02.2973529Z Entering 'third_party/mimalloc' 2025-12-04T09:19:02.3033926Z Entering 'third_party/nlohmann' 2025-12-04T09:19:02.3095427Z Entering 'third_party/onnx' 2025-12-04T09:19:02.3172025Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:19:02.3236698Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:19:02.3298251Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:19:02.3356190Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:19:02.3415615Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:19:02.3470893Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:19:02.3530003Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:19:02.3589249Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:19:02.3645924Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:19:02.3700697Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:02.3762825Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:02.3824290Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:19:02.3904757Z Entering 'third_party/pocketfft' 2025-12-04T09:19:02.3965500Z Entering 'third_party/protobuf' 2025-12-04T09:19:02.4027773Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:19:02.4085040Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:19:02.4151464Z Entering 'third_party/psimd' 2025-12-04T09:19:02.4212224Z Entering 'third_party/pthreadpool' 2025-12-04T09:19:02.4272482Z Entering 'third_party/pybind11' 2025-12-04T09:19:02.4333302Z Entering 'third_party/python-peachpy' 2025-12-04T09:19:02.4393775Z Entering 'third_party/sleef' 2025-12-04T09:19:02.4455299Z Entering 'third_party/tensorpipe' 2025-12-04T09:19:02.4515556Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:19:02.4572927Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:19:02.4630436Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:19:02.4687973Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:19:02.4742683Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:19:02.4826517Z ##[endgroup] 2025-12-04T09:19:02.4874034Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T09:19:02.4902958Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:19:02.5035893Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-12-04T09:19:02.5036233Z cd "${GITHUB_WORKSPACE}" 2025-12-04T09:19:02.5036530Z # Clean stale submodule dirs 2025-12-04T09:19:02.5036840Z if [ -z "${NO_SUDO}" ]; then 2025-12-04T09:19:02.5037221Z  sudo git submodule foreach --recursive git clean -ffdx 2025-12-04T09:19:02.5037590Z else 2025-12-04T09:19:02.5037881Z  git submodule foreach --recursive git clean -ffdx 2025-12-04T09:19:02.5038237Z fi 2025-12-04T09:19:02.5049945Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:02.5050311Z env: 2025-12-04T09:19:02.5050516Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:02.5050761Z NO_SUDO: true 2025-12-04T09:19:02.5050970Z ##[endgroup] 2025-12-04T09:19:02.5480167Z Entering 'android/libs/fbjni' 2025-12-04T09:19:02.5530583Z Entering 'third_party/FP16' 2025-12-04T09:19:02.5575801Z Entering 'third_party/FXdiv' 2025-12-04T09:19:02.5621540Z Entering 'third_party/NNPACK' 2025-12-04T09:19:02.5672464Z Entering 'third_party/NVTX' 2025-12-04T09:19:02.5729250Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:19:02.5777703Z Entering 'third_party/XNNPACK' 2025-12-04T09:19:02.5937902Z Entering 'third_party/aiter' 2025-12-04T09:19:02.5997179Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:19:02.6147120Z Entering 'third_party/benchmark' 2025-12-04T09:19:02.6197396Z Entering 'third_party/composable_kernel' 2025-12-04T09:19:02.6358622Z Entering 'third_party/cpp-httplib' 2025-12-04T09:19:02.6407390Z Entering 'third_party/cpuinfo' 2025-12-04T09:19:02.6460695Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:19:02.6512228Z Entering 'third_party/cutlass' 2025-12-04T09:19:02.6646783Z Entering 'third_party/fbgemm' 2025-12-04T09:19:02.6731953Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:19:02.6775527Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:19:02.6931405Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:19:02.6981775Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:19:02.7117644Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:19:02.7163599Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:19:02.7204463Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:19:02.7268668Z Entering 'third_party/flash-attention' 2025-12-04T09:19:02.7338082Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:19:02.7469026Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:19:02.7589380Z Entering 'third_party/flatbuffers' 2025-12-04T09:19:02.7690522Z Entering 'third_party/fmt' 2025-12-04T09:19:02.7738323Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:19:02.7790839Z Entering 'third_party/gloo' 2025-12-04T09:19:02.7840266Z Entering 'third_party/googletest' 2025-12-04T09:19:02.7889513Z Entering 'third_party/ideep' 2025-12-04T09:19:02.7933322Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:19:02.8056116Z Entering 'third_party/ittapi' 2025-12-04T09:19:02.8105420Z Entering 'third_party/kineto' 2025-12-04T09:19:02.8155575Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:19:02.8206104Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:19:02.8270758Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:19:02.8315118Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:19:02.8360568Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:19:02.8401219Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:19:02.8447951Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:19:02.8491878Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:19:02.8540380Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:19:02.8595424Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:19:02.8639185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:19:02.8683371Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:02.8750894Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:02.8806830Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:19:02.8850959Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:19:02.8900546Z Entering 'third_party/kleidiai' 2025-12-04T09:19:02.8958141Z Entering 'third_party/mimalloc' 2025-12-04T09:19:02.9009947Z Entering 'third_party/nlohmann' 2025-12-04T09:19:02.9077427Z Entering 'third_party/onnx' 2025-12-04T09:19:02.9543952Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:19:02.9597284Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:19:02.9676260Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:19:02.9720414Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:19:02.9766253Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:19:02.9813625Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:19:02.9871571Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:19:02.9916278Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:19:02.9959356Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:19:03.0002699Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:19:03.0068905Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:19:03.0119434Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:19:03.0480610Z Entering 'third_party/pocketfft' 2025-12-04T09:19:03.0531458Z Entering 'third_party/protobuf' 2025-12-04T09:19:03.0639361Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:19:03.0683028Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:19:03.0737212Z Entering 'third_party/psimd' 2025-12-04T09:19:03.0781443Z Entering 'third_party/pthreadpool' 2025-12-04T09:19:03.0826581Z Entering 'third_party/pybind11' 2025-12-04T09:19:03.0876721Z Entering 'third_party/python-peachpy' 2025-12-04T09:19:03.0923820Z Entering 'third_party/sleef' 2025-12-04T09:19:03.0973210Z Entering 'third_party/tensorpipe' 2025-12-04T09:19:03.1023302Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:19:03.1072495Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:19:03.1120354Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:19:03.1169129Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:19:03.1210961Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:19:03.1391197Z Prepare all required actions 2025-12-04T09:19:03.1391731Z Getting action download info 2025-12-04T09:19:03.2884990Z ##[group]Run ./.github/actions/setup-linux 2025-12-04T09:19:03.2885296Z env: 2025-12-04T09:19:03.2885506Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:03.2885759Z ##[endgroup] 2025-12-04T09:19:03.2923953Z ##[group]Run set -euo pipefail 2025-12-04T09:19:03.2924269Z set -euo pipefail 2025-12-04T09:19:03.2924558Z function get_ec2_metadata() { 2025-12-04T09:19:03.2924927Z  # Pulled from instance metadata endpoint for EC2 2025-12-04T09:19:03.2925547Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-12-04T09:19:03.2926113Z  category=$1 2025-12-04T09:19:03.2926626Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-12-04T09:19:03.2927047Z  runner_name_str=i-0f694664a515f0ebd 2025-12-04T09:19:03.2927411Z  if [[ -f /.inarc ]]; then 2025-12-04T09:19:03.2927742Z  echo "ARC Runner, no info on ec2 metadata" 2025-12-04T09:19:03.2928123Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-12-04T09:19:03.2928579Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-12-04T09:19:03.2929017Z  else 2025-12-04T09:19:03.2929867Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-12-04T09:19:03.2930771Z  fi 2025-12-04T09:19:03.2930974Z } 2025-12-04T09:19:03.2931230Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-12-04T09:19:03.2931654Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-12-04T09:19:03.2932129Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-12-04T09:19:03.2932539Z echo "system info $(uname -a)" 2025-12-04T09:19:03.2941777Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:03.2942139Z env: 2025-12-04T09:19:03.2942336Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:03.2942577Z ##[endgroup] 2025-12-04T09:19:03.3132079Z ami-id: ami-08982f1c5bf93d976 2025-12-04T09:19:03.3255841Z instance-id: i-0f694664a515f0ebd 2025-12-04T09:19:03.3383323Z instance-type: g5.4xlarge 2025-12-04T09:19:03.3399065Z system info Linux ip-10-0-18-14.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep 9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-12-04T09:19:03.3424854Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T09:19:03.3425448Z if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T09:19:03.3435264Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:03.3435636Z env: 2025-12-04T09:19:03.3435833Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:03.3436091Z ##[endgroup] 2025-12-04T09:19:04.9561322Z Thu Dec 4 09:19:04 2025 2025-12-04T09:19:04.9561879Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:19:04.9562412Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:19:04.9562925Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:19:04.9563450Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:19:04.9564007Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:19:04.9564464Z | | | MIG M. | 2025-12-04T09:19:04.9564812Z |=========================================+========================+======================| 2025-12-04T09:19:04.9658805Z | 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:19:04.9659652Z | 0% 24C P0 52W / 300W | 0MiB / 23028MiB | 3% Default | 2025-12-04T09:19:04.9660061Z | | | N/A | 2025-12-04T09:19:04.9660477Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:19:04.9660779Z 2025-12-04T09:19:04.9661126Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:19:04.9661570Z | Processes: | 2025-12-04T09:19:04.9662043Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:19:04.9662479Z | ID ID Usage | 2025-12-04T09:19:04.9662988Z |=========================================================================================| 2025-12-04T09:19:04.9664135Z | No running processes found | 2025-12-04T09:19:04.9664640Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:19:05.4070462Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:19:05.4071429Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:19:05.4084632Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:05.4085006Z env: 2025-12-04T09:19:05.4085216Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:05.4085470Z ##[endgroup] 2025-12-04T09:19:05.4150939Z ##[group]Run if systemctl is-active --quiet docker; then 2025-12-04T09:19:05.4151394Z if systemctl is-active --quiet docker; then 2025-12-04T09:19:05.4151775Z  echo "Docker daemon is running..."; 2025-12-04T09:19:05.4152104Z else 2025-12-04T09:19:05.4152446Z  echo "Starting docker daemon..." && sudo systemctl start docker; 2025-12-04T09:19:05.4152865Z fi 2025-12-04T09:19:05.4161621Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:05.4162046Z env: 2025-12-04T09:19:05.4162243Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:05.4162500Z ##[endgroup] 2025-12-04T09:19:05.4266646Z Docker daemon is running... 2025-12-04T09:19:05.4306829Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:19:05.4307126Z with: 2025-12-04T09:19:05.4307318Z shell: bash 2025-12-04T09:19:05.4307524Z timeout_minutes: 5 2025-12-04T09:19:05.4308056Z max_attempts: 3 2025-12-04T09:19:05.4308289Z retry_wait_seconds: 30 2025-12-04T09:19:05.4310643Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-12-04T09:19:05.4313014Z polling_interval_seconds: 1 2025-12-04T09:19:05.4313284Z warning_on_retry: true 2025-12-04T09:19:05.4313535Z continue_on_error: false 2025-12-04T09:19:05.4313767Z env: 2025-12-04T09:19:05.4313967Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:05.4314224Z AWS_RETRY_MODE: standard 2025-12-04T09:19:05.4314466Z AWS_MAX_ATTEMPTS: 5 2025-12-04T09:19:05.4314710Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:19:05.4314976Z ##[endgroup] 2025-12-04T09:19:06.5779334Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:19:06.5780598Z Configure a credential helper to remove this warning. See 2025-12-04T09:19:06.5781345Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:19:06.5781822Z 2025-12-04T09:19:06.5781934Z Login Succeeded 2025-12-04T09:19:07.5162343Z Command completed after 1 attempt(s). 2025-12-04T09:19:07.5247910Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:19:07.5248416Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:19:07.5248869Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:19:07.5260951Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:07.5261317Z env: 2025-12-04T09:19:07.5261509Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:07.5261753Z ##[endgroup] 2025-12-04T09:19:07.5358931Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T09:19:07.5359482Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T09:19:07.5359979Z # shellcheck disable=SC2046 2025-12-04T09:19:07.5360296Z docker stop $(docker ps -q) || true 2025-12-04T09:19:07.5360627Z # Prune all of the docker images 2025-12-04T09:19:07.5360941Z docker system prune -af 2025-12-04T09:19:07.5369472Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:07.5369828Z env: 2025-12-04T09:19:07.5370036Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:07.5370284Z ##[endgroup] 2025-12-04T09:19:07.5714959Z "docker stop" requires at least 1 argument. 2025-12-04T09:19:07.5715358Z See 'docker stop --help'. 2025-12-04T09:19:07.5715527Z 2025-12-04T09:19:07.5715687Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-12-04T09:19:07.5715964Z 2025-12-04T09:19:07.5716067Z Stop one or more running containers 2025-12-04T09:19:07.5954816Z Total reclaimed space: 0B 2025-12-04T09:19:07.6150856Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-12-04T09:19:07.6151331Z with: 2025-12-04T09:19:07.6152148Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:07.6153071Z use-custom-docker-registry: true 2025-12-04T09:19:07.6153389Z docker-build-dir: .ci/docker 2025-12-04T09:19:07.6153683Z docker-build-script: ./build.sh 2025-12-04T09:19:07.6153977Z working-directory: . 2025-12-04T09:19:07.6154328Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:07.6154724Z force-push: false 2025-12-04T09:19:07.6154946Z env: 2025-12-04T09:19:07.6155145Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:07.6155396Z ##[endgroup] 2025-12-04T09:19:07.6173965Z ##[group]Run set -ex 2025-12-04T09:19:07.6174229Z set -ex 2025-12-04T09:19:07.6174460Z  2025-12-04T09:19:07.6174871Z # If the docker build directory or the build script doesn't exist, the action will 2025-12-04T09:19:07.6175542Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-12-04T09:19:07.6176128Z # job could then download the pre-built image as usual 2025-12-04T09:19:07.6176823Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-12-04T09:19:07.6177514Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6177841Z else 2025-12-04T09:19:07.6178088Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6178537Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6178931Z  2025-12-04T09:19:07.6179544Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-12-04T09:19:07.6180189Z  exit 0 2025-12-04T09:19:07.6180401Z fi 2025-12-04T09:19:07.6180607Z  2025-12-04T09:19:07.6180944Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-12-04T09:19:07.6181544Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-12-04T09:19:07.6182070Z  # use it as it is, but first let's extract the tag 2025-12-04T09:19:07.6182549Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-12-04T09:19:07.6183060Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6183548Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6183959Z else 2025-12-04T09:19:07.6184226Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-12-04T09:19:07.6184609Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-12-04T09:19:07.6185177Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-12-04T09:19:07.6185513Z  fi 2025-12-04T09:19:07.6185975Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-12-04T09:19:07.6186587Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6187243Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6188008Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6188433Z fi 2025-12-04T09:19:07.6197634Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:07.6198008Z env: 2025-12-04T09:19:07.6198216Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:07.6198468Z REPO_NAME: pytorch 2025-12-04T09:19:07.6199447Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:07.6200356Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T09:19:07.6213528Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-12-04T09:19:07.6213917Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:07.6214314Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-12-04T09:19:07.6214597Z CUSTOM_TAG_PREFIX: 2025-12-04T09:19:07.6214830Z ##[endgroup] 2025-12-04T09:19:07.6245977Z + [[ -d .ci/docker ]] 2025-12-04T09:19:07.6246669Z + [[ -f .ci/docker/./build.sh ]] 2025-12-04T09:19:07.6247129Z + [[ true == \t\r\u\e ]] 2025-12-04T09:19:07.6247503Z + echo skip=false 2025-12-04T09:19:07.6248912Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-12-04T09:19:07.6255820Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:07.6256686Z ++ awk -F '[:,]' '{print $2}' 2025-12-04T09:19:07.6292606Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:07.6293523Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:07.6294735Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:07.6322622Z ##[group]Run set +e 2025-12-04T09:19:07.6322903Z set +e 2025-12-04T09:19:07.6323110Z set -x 2025-12-04T09:19:07.6323335Z  2025-12-04T09:19:07.6323603Z login() { 2025-12-04T09:19:07.6324147Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T09:19:07.6324670Z } 2025-12-04T09:19:07.6324871Z  2025-12-04T09:19:07.6325068Z retry () { 2025-12-04T09:19:07.6325322Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T09:19:07.6325630Z } 2025-12-04T09:19:07.6325826Z  2025-12-04T09:19:07.6326044Z retry login "${DOCKER_REGISTRY}" 2025-12-04T09:19:07.6326347Z  2025-12-04T09:19:07.6326554Z START_TIME=$(date +%s) 2025-12-04T09:19:07.6326832Z # Wait up to 120 minutes 2025-12-04T09:19:07.6327229Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-12-04T09:19:07.6327733Z  # Check if image already exists, if it does then skip building it 2025-12-04T09:19:07.6328222Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-12-04T09:19:07.6328571Z  exit 0 2025-12-04T09:19:07.6328794Z  fi 2025-12-04T09:19:07.6329002Z  2025-12-04T09:19:07.6329571Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-12-04T09:19:07.6330240Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-12-04T09:19:07.6330906Z  # latter, it will wait for the Docker images to become available before continuing 2025-12-04T09:19:07.6331427Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-12-04T09:19:07.6331813Z  # It's a Docker build job, let's build the image 2025-12-04T09:19:07.6332160Z  break 2025-12-04T09:19:07.6332382Z  else 2025-12-04T09:19:07.6332706Z  # It's a regular build job, wait for the image to become available 2025-12-04T09:19:07.6333113Z  sleep 300 2025-12-04T09:19:07.6333355Z  fi 2025-12-04T09:19:07.6333558Z done 2025-12-04T09:19:07.6333751Z  2025-12-04T09:19:07.6334091Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-12-04T09:19:07.6334830Z # be empty. The default action would be to continue rebuild the image 2025-12-04T09:19:07.6335340Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-12-04T09:19:07.6335799Z  # if we're on the base branch then use the parent commit 2025-12-04T09:19:07.6336198Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-12-04T09:19:07.6336500Z else 2025-12-04T09:19:07.6336803Z  # otherwise we're on a PR, so use the most recent base commit 2025-12-04T09:19:07.6337274Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-12-04T09:19:07.6337636Z fi 2025-12-04T09:19:07.6337827Z  2025-12-04T09:19:07.6338047Z if [[ -z "${MERGE_BASE}" ]]; then 2025-12-04T09:19:07.6338398Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6338711Z  2025-12-04T09:19:07.6339272Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-12-04T09:19:07.6339837Z  exit 0 2025-12-04T09:19:07.6340054Z fi 2025-12-04T09:19:07.6340247Z  2025-12-04T09:19:07.6340544Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-12-04T09:19:07.6341230Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-12-04T09:19:07.6341805Z  exit 1 2025-12-04T09:19:07.6342018Z fi 2025-12-04T09:19:07.6342215Z  2025-12-04T09:19:07.6342562Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-12-04T09:19:07.6343221Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-12-04T09:19:07.6343804Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-12-04T09:19:07.6344501Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-12-04T09:19:07.6345287Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-12-04T09:19:07.6345741Z fi 2025-12-04T09:19:07.6345943Z  2025-12-04T09:19:07.6346202Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:19:07.6355401Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:07.6355777Z env: 2025-12-04T09:19:07.6355979Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:07.6356237Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T09:19:07.6356574Z BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:19:07.6357569Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:07.6358730Z DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:07.6359499Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:07.6359900Z DOCKER_PUSH: 2025-12-04T09:19:07.6360122Z ##[endgroup] 2025-12-04T09:19:07.6389971Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:07.6390386Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:07.6393044Z + aws ecr get-login-password --region us-east-1 2025-12-04T09:19:07.6395094Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:08.1589916Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:19:08.1590535Z Configure a credential helper to remove this warning. See 2025-12-04T09:19:08.1591102Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:19:08.1591495Z 2025-12-04T09:19:08.1598978Z Login Succeeded 2025-12-04T09:19:08.1626607Z ++ date +%s 2025-12-04T09:19:08.1641090Z + START_TIME=1764839948 2025-12-04T09:19:08.1644709Z ++ date +%s 2025-12-04T09:19:08.1658309Z + [[ 1764832748 -lt 1764839948 ]] 2025-12-04T09:19:08.1659292Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:08.3836192Z { 2025-12-04T09:19:08.3836488Z "schemaVersion": 2, 2025-12-04T09:19:08.3837050Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-12-04T09:19:08.3837653Z "config": { 2025-12-04T09:19:08.3838059Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-12-04T09:19:08.3838547Z "size": 34864, 2025-12-04T09:19:08.3839053Z "digest": "sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301" 2025-12-04T09:19:08.3839641Z }, 2025-12-04T09:19:08.3839868Z "layers": [ 2025-12-04T09:19:08.3840118Z { 2025-12-04T09:19:08.3840518Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3841033Z "size": 30447951, 2025-12-04T09:19:08.3841586Z "digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63" 2025-12-04T09:19:08.3842180Z }, 2025-12-04T09:19:08.3842410Z { 2025-12-04T09:19:08.3842805Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3843314Z "size": 1554, 2025-12-04T09:19:08.3843805Z "digest": "sha256:0678d56345c994444b77bb70b1177189d23e794748b1d75ffc45d227c7dea94a" 2025-12-04T09:19:08.3844369Z }, 2025-12-04T09:19:08.3844605Z { 2025-12-04T09:19:08.3845008Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3845518Z "size": 313275661, 2025-12-04T09:19:08.3846066Z "digest": "sha256:45f5c9ddfce78349dff3d5edfbaa0310ae17311f66abdcd7e00fa21b500e801c" 2025-12-04T09:19:08.3846672Z }, 2025-12-04T09:19:08.3846902Z { 2025-12-04T09:19:08.3847307Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3847825Z "size": 787, 2025-12-04T09:19:08.3848350Z "digest": "sha256:086b1df51ac1162d9c45698e9dfaf91c6c222c8bd9ab01797ac8f9344bc8044f" 2025-12-04T09:19:08.3848953Z }, 2025-12-04T09:19:08.3849190Z { 2025-12-04T09:19:08.3849598Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3850101Z "size": 106, 2025-12-04T09:19:08.3850613Z "digest": "sha256:fe8a7b64bf98352f89057bcba66beef2fb44cc05fbd3606abccd8e86cf476234" 2025-12-04T09:19:08.3851205Z }, 2025-12-04T09:19:08.3851577Z { 2025-12-04T09:19:08.3851985Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3852506Z "size": 703, 2025-12-04T09:19:08.3853003Z "digest": "sha256:7680723e9a578033dd106b45784c639f06cc8adb1f5239ec513d9de01087c1af" 2025-12-04T09:19:08.3853594Z }, 2025-12-04T09:19:08.3853833Z { 2025-12-04T09:19:08.3854238Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3854760Z "size": 1216, 2025-12-04T09:19:08.3855261Z "digest": "sha256:9c5027aeeb4e3101f48c1d2e400c387110e1009e42497ee801f1b4b7f7edb5c0" 2025-12-04T09:19:08.3856222Z }, 2025-12-04T09:19:08.3856458Z { 2025-12-04T09:19:08.3856876Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3857440Z "size": 483, 2025-12-04T09:19:08.3857918Z "digest": "sha256:9a56521103600bd37a1e7c1191b5136c2d738c092f8a6701499f7068a32c2628" 2025-12-04T09:19:08.3858507Z }, 2025-12-04T09:19:08.3858732Z { 2025-12-04T09:19:08.3859238Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3859776Z "size": 110361875, 2025-12-04T09:19:08.3860313Z "digest": "sha256:375c4427e9141269458333b1463fdb219e736fd6231ec1c56c625c48437ace77" 2025-12-04T09:19:08.3860907Z }, 2025-12-04T09:19:08.3861139Z { 2025-12-04T09:19:08.3861545Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3862079Z "size": 4961, 2025-12-04T09:19:08.3862609Z "digest": "sha256:a86faaa7dbdd70e678e5ea20072637ee42618921ca8f80ca089f789325d4b0c2" 2025-12-04T09:19:08.3863224Z }, 2025-12-04T09:19:08.3863460Z { 2025-12-04T09:19:08.3864079Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3864621Z "size": 1755, 2025-12-04T09:19:08.3865144Z "digest": "sha256:fb7848686804957915d98f8655ef6da0fe4c521b50a82aefdebf475983505a15" 2025-12-04T09:19:08.3865746Z }, 2025-12-04T09:19:08.3865979Z { 2025-12-04T09:19:08.3866396Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3866928Z "size": 724, 2025-12-04T09:19:08.3867442Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T09:19:08.3868043Z }, 2025-12-04T09:19:08.3868286Z { 2025-12-04T09:19:08.3868694Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3869212Z "size": 543, 2025-12-04T09:19:08.3869696Z "digest": "sha256:79dc80f426b29d4ae9157b967050b03e66aa0c4b1295b944a1dd70106be87066" 2025-12-04T09:19:08.3870158Z }, 2025-12-04T09:19:08.3870339Z { 2025-12-04T09:19:08.3870657Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3871060Z "size": 3185190117, 2025-12-04T09:19:08.3871498Z "digest": "sha256:a13fcc1b90bb9c251ebe7ef2a03c4cb3afa1c8bdafe84f5f85136773059a3735" 2025-12-04T09:19:08.3871980Z }, 2025-12-04T09:19:08.3872152Z { 2025-12-04T09:19:08.3872463Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3872866Z "size": 32, 2025-12-04T09:19:08.3873264Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3873795Z }, 2025-12-04T09:19:08.3873984Z { 2025-12-04T09:19:08.3874306Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3874710Z "size": 396, 2025-12-04T09:19:08.3875118Z "digest": "sha256:549db4d6c618ecd9534658a233e3c90508f82d8735f965c2786b2eaa078869e5" 2025-12-04T09:19:08.3875592Z }, 2025-12-04T09:19:08.3875763Z { 2025-12-04T09:19:08.3876078Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3876499Z "size": 236860, 2025-12-04T09:19:08.3876901Z "digest": "sha256:5c63528cb580001e65104f4cb0809bf0673a00f989a7db42fd6d86aa1ec27cee" 2025-12-04T09:19:08.3877374Z }, 2025-12-04T09:19:08.3877564Z { 2025-12-04T09:19:08.3877874Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3878287Z "size": 231, 2025-12-04T09:19:08.3878699Z "digest": "sha256:75bd83b989a44e4d4119a3f972891025eb0e9ce95cfbe4a0ca5cdbe7130028d6" 2025-12-04T09:19:08.3879171Z }, 2025-12-04T09:19:08.3879349Z { 2025-12-04T09:19:08.3879665Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3880077Z "size": 3043497, 2025-12-04T09:19:08.3880488Z "digest": "sha256:de6e78970f517178cb91f36cd02bd9ca7b72a08fb82a0f9007516026f258c035" 2025-12-04T09:19:08.3880970Z }, 2025-12-04T09:19:08.3881153Z { 2025-12-04T09:19:08.3881460Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3881981Z "size": 1472, 2025-12-04T09:19:08.3882396Z "digest": "sha256:e13ed7c7e4736e81dc21af755b3363eb26e4d3b2f1ca988dfe65effa47d8fa42" 2025-12-04T09:19:08.3882870Z }, 2025-12-04T09:19:08.3883045Z { 2025-12-04T09:19:08.3883358Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3883763Z "size": 481, 2025-12-04T09:19:08.3884167Z "digest": "sha256:6e2949bcb74152577a0f20c38bcb6dd80f5e68427e3e531a80e08c9ecc73a979" 2025-12-04T09:19:08.3884640Z }, 2025-12-04T09:19:08.3884818Z { 2025-12-04T09:19:08.3885130Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3885538Z "size": 202, 2025-12-04T09:19:08.3885956Z "digest": "sha256:14d69d9aaec70287efd2fd35c4f93e43a29a4098458cc9fca1c93f02ad7356cb" 2025-12-04T09:19:08.3886425Z }, 2025-12-04T09:19:08.3886605Z { 2025-12-04T09:19:08.3886926Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3887371Z "size": 607, 2025-12-04T09:19:08.3887898Z "digest": "sha256:5c02769dd8e5bba2f7f5fd84bde9595fcb3bdbffcae497503fa846f9b5e78bf5" 2025-12-04T09:19:08.3888381Z }, 2025-12-04T09:19:08.3888553Z { 2025-12-04T09:19:08.3888869Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3889291Z "size": 7889619584, 2025-12-04T09:19:08.3889707Z "digest": "sha256:35041ce524ac4afec40ecd73b1393c830614f1f79d43a6439767a6c7d5b7027b" 2025-12-04T09:19:08.3890178Z }, 2025-12-04T09:19:08.3890353Z { 2025-12-04T09:19:08.3890669Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3891071Z "size": 830, 2025-12-04T09:19:08.3891487Z "digest": "sha256:2fa92dc5885e080e049ceb4139288b6c0e39fab34256945708b08ea55a1f7a0b" 2025-12-04T09:19:08.3891959Z }, 2025-12-04T09:19:08.3892138Z { 2025-12-04T09:19:08.3892461Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3892876Z "size": 33451739, 2025-12-04T09:19:08.3893300Z "digest": "sha256:2b85eafbd92a0e70a0a70154ad8bf4584095e576d95873368f30373f5966714a" 2025-12-04T09:19:08.3893773Z }, 2025-12-04T09:19:08.3893955Z { 2025-12-04T09:19:08.3894268Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3894679Z "size": 104, 2025-12-04T09:19:08.3895101Z "digest": "sha256:ff755a4ddad7880f23c6b767d432d6f1eafdb62b3ea18f8a98e22c441c099fcb" 2025-12-04T09:19:08.3895585Z }, 2025-12-04T09:19:08.3895762Z { 2025-12-04T09:19:08.3896093Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3896513Z "size": 1496, 2025-12-04T09:19:08.3896940Z "digest": "sha256:09eb41bdf42d8605b57b2363348154140904dec914b34a67298b82122bfce2b3" 2025-12-04T09:19:08.3897415Z }, 2025-12-04T09:19:08.3897592Z { 2025-12-04T09:19:08.3897926Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3898359Z "size": 458787828, 2025-12-04T09:19:08.3898788Z "digest": "sha256:11ede4d59e935e62f41b33220fe871794ab5e57ce724173b713368977683bcf6" 2025-12-04T09:19:08.3899348Z }, 2025-12-04T09:19:08.3899534Z { 2025-12-04T09:19:08.3899860Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3912183Z "size": 164, 2025-12-04T09:19:08.3912766Z "digest": "sha256:1283cd8f801a142172f3ab76fd472df8583223d9437de3e4d18d8cf98ea3fa98" 2025-12-04T09:19:08.3913244Z }, 2025-12-04T09:19:08.3913414Z { 2025-12-04T09:19:08.3913738Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3914146Z "size": 346, 2025-12-04T09:19:08.3914554Z "digest": "sha256:024fa855425fa524ad4500660cf61d53be62b99556d31b8b280d14caba434a35" 2025-12-04T09:19:08.3915014Z }, 2025-12-04T09:19:08.3915194Z { 2025-12-04T09:19:08.3915515Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3915917Z "size": 32, 2025-12-04T09:19:08.3916329Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3916997Z }, 2025-12-04T09:19:08.3917177Z { 2025-12-04T09:19:08.3917490Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3917901Z "size": 106, 2025-12-04T09:19:08.3918305Z "digest": "sha256:303e6747a62efecf5efa1f97d0e66b40a3b39da8d79a51f75b89f4c92ae7ec52" 2025-12-04T09:19:08.3918782Z }, 2025-12-04T09:19:08.3918962Z { 2025-12-04T09:19:08.3919271Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3919675Z "size": 424, 2025-12-04T09:19:08.3920099Z "digest": "sha256:3017cdf4838bcc9a33daebc07487f8ae1f6bd6e7ce8322c14f5480e8db9ef90e" 2025-12-04T09:19:08.3920583Z }, 2025-12-04T09:19:08.3920760Z { 2025-12-04T09:19:08.3921070Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3921482Z "size": 19309374, 2025-12-04T09:19:08.3921900Z "digest": "sha256:6b6cd1c358e886dc6ed7fd46ac4bcc1a0a73b7b1301739ea1953478ee5d83f50" 2025-12-04T09:19:08.3922560Z }, 2025-12-04T09:19:08.3922730Z { 2025-12-04T09:19:08.3923166Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3923570Z "size": 108, 2025-12-04T09:19:08.3923967Z "digest": "sha256:b2dd045011241d1cf8889e2a7369d9fe4844dfe15529b520ccd6a59bd3c1532e" 2025-12-04T09:19:08.3924418Z }, 2025-12-04T09:19:08.3924597Z { 2025-12-04T09:19:08.3924905Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3925298Z "size": 827, 2025-12-04T09:19:08.3925694Z "digest": "sha256:55adc51fe5897031d4cf2f2b8fd162213f6e46a52848630c616606271b97952e" 2025-12-04T09:19:08.3926159Z }, 2025-12-04T09:19:08.3926334Z { 2025-12-04T09:19:08.3926634Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3927044Z "size": 724, 2025-12-04T09:19:08.3927492Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T09:19:08.3927940Z }, 2025-12-04T09:19:08.3928118Z { 2025-12-04T09:19:08.3928442Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3928835Z "size": 149, 2025-12-04T09:19:08.3929230Z "digest": "sha256:a43ca0e4b837964b12b7469194cfe939c26de027298040028975324dce25938a" 2025-12-04T09:19:08.3929690Z }, 2025-12-04T09:19:08.3929861Z { 2025-12-04T09:19:08.3930176Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3930577Z "size": 138, 2025-12-04T09:19:08.3930984Z "digest": "sha256:b7212f17fd1404837fcfdd086dd0e2667931e4db377d45d8d89a44390c84e11d" 2025-12-04T09:19:08.3931447Z }, 2025-12-04T09:19:08.3931619Z { 2025-12-04T09:19:08.3931936Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3932332Z "size": 141, 2025-12-04T09:19:08.3932735Z "digest": "sha256:083e42cac090e6486c35f392b64ee54448f5e4aa947003aeb3e1f92c8ea5c099" 2025-12-04T09:19:08.3933204Z }, 2025-12-04T09:19:08.3933395Z { 2025-12-04T09:19:08.3933802Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3934230Z "size": 32, 2025-12-04T09:19:08.3934631Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3935107Z }, 2025-12-04T09:19:08.3935287Z { 2025-12-04T09:19:08.3935592Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3935997Z "size": 223, 2025-12-04T09:19:08.3936405Z "digest": "sha256:0a00b784a4aac341795729b254f7edd09e811b7f51d0c58e0e6bfeeee6940503" 2025-12-04T09:19:08.3936872Z }, 2025-12-04T09:19:08.3937041Z { 2025-12-04T09:19:08.3937367Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3937789Z "size": 255, 2025-12-04T09:19:08.3938188Z "digest": "sha256:c6173c779f7ba143a21214ea5f032b141863a37ceb4c0ac01d3248c216ce5241" 2025-12-04T09:19:08.3938658Z }, 2025-12-04T09:19:08.3938827Z { 2025-12-04T09:19:08.3939197Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3939705Z "size": 145520672, 2025-12-04T09:19:08.3940124Z "digest": "sha256:ed3d1e3387b924585c332bf1bc252fa159cd0d25256a874043ff0141b1ab5ff7" 2025-12-04T09:19:08.3940581Z }, 2025-12-04T09:19:08.3940752Z { 2025-12-04T09:19:08.3941055Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3941447Z "size": 106, 2025-12-04T09:19:08.3941837Z "digest": "sha256:b29343478586aeee19d2a622661716f6f1591280c890f49b727a8da13a610784" 2025-12-04T09:19:08.3942287Z }, 2025-12-04T09:19:08.3942461Z { 2025-12-04T09:19:08.3942765Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3943171Z "size": 312293530, 2025-12-04T09:19:08.3943587Z "digest": "sha256:c6f0520487fb506bc4601fd84d5f28d8a76b203e004731e4b2067c2ab1a14e0b" 2025-12-04T09:19:08.3944047Z }, 2025-12-04T09:19:08.3944223Z { 2025-12-04T09:19:08.3944546Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3944950Z "size": 3058011133, 2025-12-04T09:19:08.3945450Z "digest": "sha256:148171691cd4c4d20310d490d4b4dd903490d04ea07fb8f7e668a28768683e9a" 2025-12-04T09:19:08.3945914Z }, 2025-12-04T09:19:08.3946083Z { 2025-12-04T09:19:08.3946392Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3946797Z "size": 129, 2025-12-04T09:19:08.3947196Z "digest": "sha256:2c666d30ed77fff9ff1167d41cd645dad98280fcbe941f5bc3828c7ae66b1287" 2025-12-04T09:19:08.3947662Z }, 2025-12-04T09:19:08.3947830Z { 2025-12-04T09:19:08.3948137Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3948527Z "size": 880, 2025-12-04T09:19:08.3948928Z "digest": "sha256:5d8d3a0a98e012c5068e0f3bae5a03e3148ecf2d063634eee4c9241a1e3fdfb5" 2025-12-04T09:19:08.3949402Z }, 2025-12-04T09:19:08.3949570Z { 2025-12-04T09:19:08.3949881Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3950283Z "size": 724, 2025-12-04T09:19:08.3950684Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T09:19:08.3951142Z }, 2025-12-04T09:19:08.3951324Z { 2025-12-04T09:19:08.3951629Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3952036Z "size": 139, 2025-12-04T09:19:08.3952433Z "digest": "sha256:b06bafce9e817295d8127207747c80aa18e04392ff0875844fc30a1e794a8a0c" 2025-12-04T09:19:08.3952892Z }, 2025-12-04T09:19:08.3953062Z { 2025-12-04T09:19:08.3953377Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3953780Z "size": 32, 2025-12-04T09:19:08.3954182Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3954659Z }, 2025-12-04T09:19:08.3954840Z { 2025-12-04T09:19:08.3955141Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3955543Z "size": 159, 2025-12-04T09:19:08.3955942Z "digest": "sha256:15e0d7e4590d3d8f598d05aec3a92f891bf8b4605bcc38cc2de852b6014ef8f3" 2025-12-04T09:19:08.3956419Z }, 2025-12-04T09:19:08.3956598Z { 2025-12-04T09:19:08.3956906Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3957336Z "size": 1011, 2025-12-04T09:19:08.3957761Z "digest": "sha256:a514bd1add3164d8d7ca99aa19294c4ed8b97b074635d98714c4f598a959f4cd" 2025-12-04T09:19:08.3958233Z }, 2025-12-04T09:19:08.3958403Z { 2025-12-04T09:19:08.3958711Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3959115Z "size": 724, 2025-12-04T09:19:08.3959498Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T09:19:08.3959943Z }, 2025-12-04T09:19:08.3960119Z { 2025-12-04T09:19:08.3960429Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3960820Z "size": 134, 2025-12-04T09:19:08.3961220Z "digest": "sha256:57b84ee6000204f27a1d9bca199b19be4c86ecd324540dbdf239c56a6c3b34ea" 2025-12-04T09:19:08.3961774Z }, 2025-12-04T09:19:08.3961937Z { 2025-12-04T09:19:08.3962247Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3962644Z "size": 32, 2025-12-04T09:19:08.3963040Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3963499Z }, 2025-12-04T09:19:08.3963673Z { 2025-12-04T09:19:08.3963977Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3964375Z "size": 157, 2025-12-04T09:19:08.3964781Z "digest": "sha256:b8babeff6d817a5961dddc15c6bdfdbd05da187fae75d5804015f99fd7c066d8" 2025-12-04T09:19:08.3965261Z }, 2025-12-04T09:19:08.3965426Z { 2025-12-04T09:19:08.3965743Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3966154Z "size": 602, 2025-12-04T09:19:08.3966550Z "digest": "sha256:83779ddf6a85ab387f64a45f274cba245b69e4fd1931ff0b5d7d3efd4b7a43bc" 2025-12-04T09:19:08.3967023Z }, 2025-12-04T09:19:08.3967202Z { 2025-12-04T09:19:08.3967651Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3968051Z "size": 724, 2025-12-04T09:19:08.3968443Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T09:19:08.3968897Z }, 2025-12-04T09:19:08.3969068Z { 2025-12-04T09:19:08.3969383Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3969791Z "size": 155, 2025-12-04T09:19:08.3970188Z "digest": "sha256:8b7620c0d736cc79381207ce5afe2af90f0cd7f0cd394577d2c9520d7f74762f" 2025-12-04T09:19:08.3970657Z }, 2025-12-04T09:19:08.3970837Z { 2025-12-04T09:19:08.3971138Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3971534Z "size": 32, 2025-12-04T09:19:08.3971932Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3972386Z }, 2025-12-04T09:19:08.3972553Z { 2025-12-04T09:19:08.3972867Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3973267Z "size": 188, 2025-12-04T09:19:08.3973674Z "digest": "sha256:3bcfa090e4efd3677425f76baea9f1e0c50a75d8c6b5713ec05310f1dff24539" 2025-12-04T09:19:08.3974150Z }, 2025-12-04T09:19:08.3974330Z { 2025-12-04T09:19:08.3974639Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3975040Z "size": 1370, 2025-12-04T09:19:08.3975448Z "digest": "sha256:eb0504ec4d9218a79896b604f73dc0ea5a0f96266ad9c2cdbbbe5f0f18222694" 2025-12-04T09:19:08.3975917Z }, 2025-12-04T09:19:08.3976073Z { 2025-12-04T09:19:08.3976370Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3976781Z "size": 32, 2025-12-04T09:19:08.3977184Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3977682Z }, 2025-12-04T09:19:08.3977883Z { 2025-12-04T09:19:08.3978190Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3978606Z "size": 136, 2025-12-04T09:19:08.3979104Z "digest": "sha256:15d0fec09d7b196a1462d51516ee90fc3443ba178d3e56d59cacf32146b4321d" 2025-12-04T09:19:08.3979566Z }, 2025-12-04T09:19:08.3979736Z { 2025-12-04T09:19:08.3980043Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3980438Z "size": 528, 2025-12-04T09:19:08.3980834Z "digest": "sha256:cca81fcc62a949959ca4dd3c9056fb293d548ef8607127eeeef6cfd3a8897ca8" 2025-12-04T09:19:08.3981312Z }, 2025-12-04T09:19:08.3981484Z { 2025-12-04T09:19:08.3981783Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3982180Z "size": 32, 2025-12-04T09:19:08.3982583Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3983048Z }, 2025-12-04T09:19:08.3983231Z { 2025-12-04T09:19:08.3983532Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3984014Z "size": 104, 2025-12-04T09:19:08.3984431Z "digest": "sha256:b0b8f9b5c6ab98db9cd830dc584e1b6aec9add139e4cc48d8c243d36691e25b4" 2025-12-04T09:19:08.3984912Z }, 2025-12-04T09:19:08.3985077Z { 2025-12-04T09:19:08.3985405Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3985827Z "size": 435, 2025-12-04T09:19:08.3986238Z "digest": "sha256:0606ca4d47a8a70e91e92b03ca51a85e731641b09342136a54ef2f2a6d9dfb44" 2025-12-04T09:19:08.3986714Z }, 2025-12-04T09:19:08.3986899Z { 2025-12-04T09:19:08.3987224Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3987621Z "size": 32, 2025-12-04T09:19:08.3988038Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.3988515Z }, 2025-12-04T09:19:08.3988679Z { 2025-12-04T09:19:08.3988992Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3989398Z "size": 109, 2025-12-04T09:19:08.3989924Z "digest": "sha256:2f80a4e1b3b95ed67bb781ea787e8a63e46de79117d9d8e65c257072b38afa2d" 2025-12-04T09:19:08.3990399Z }, 2025-12-04T09:19:08.3990581Z { 2025-12-04T09:19:08.3990893Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3991309Z "size": 1896, 2025-12-04T09:19:08.3991721Z "digest": "sha256:35c916fb1bd057e517dcab78c3a2a018e68096d8993892ad84f47562d37ae352" 2025-12-04T09:19:08.3992189Z }, 2025-12-04T09:19:08.3992363Z { 2025-12-04T09:19:08.3992680Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3993093Z "size": 197526165, 2025-12-04T09:19:08.3993493Z "digest": "sha256:195537b7dafc96192f768323b1a8cc2a914d41959849b73198579576b0872a44" 2025-12-04T09:19:08.3993952Z }, 2025-12-04T09:19:08.3994136Z { 2025-12-04T09:19:08.3994448Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3994867Z "size": 106, 2025-12-04T09:19:08.3995281Z "digest": "sha256:dc454fd3967e5735b2498b7f1d958a2c626987d5e4ce225ca98da3cd945b59f3" 2025-12-04T09:19:08.3995757Z }, 2025-12-04T09:19:08.3995939Z { 2025-12-04T09:19:08.3996255Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3996669Z "size": 165, 2025-12-04T09:19:08.3997061Z "digest": "sha256:701b34f115fa897181c046dc37288e87cbc3ad74c36a9e2224b5bfe7c5703afb" 2025-12-04T09:19:08.3997574Z }, 2025-12-04T09:19:08.3997772Z { 2025-12-04T09:19:08.3998088Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.3998494Z "size": 7944, 2025-12-04T09:19:08.3998907Z "digest": "sha256:39cefc00ffedebc9098261c798408b87a20c95a88fccb110594077f48dadf760" 2025-12-04T09:19:08.3999377Z }, 2025-12-04T09:19:08.3999566Z { 2025-12-04T09:19:08.3999886Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.4000294Z "size": 8071, 2025-12-04T09:19:08.4000697Z "digest": "sha256:6ae51eb61a325b2c2995a5088c81aa20821b75be65b5aa722c7c40556b5d03ea" 2025-12-04T09:19:08.4001178Z }, 2025-12-04T09:19:08.4001349Z { 2025-12-04T09:19:08.4001664Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.4002074Z "size": 304, 2025-12-04T09:19:08.4002485Z "digest": "sha256:1fd5341e66dfc0c1ae23af014641a92a6fd02640c528fe6d4dc55921ed659a26" 2025-12-04T09:19:08.4002953Z }, 2025-12-04T09:19:08.4003128Z { 2025-12-04T09:19:08.4003448Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.4003844Z "size": 13364291, 2025-12-04T09:19:08.4004264Z "digest": "sha256:72a7c87e35e40ab796f90aee1b51add7902f0cdc44406d2505b6c6a1f55a8da6" 2025-12-04T09:19:08.4004742Z }, 2025-12-04T09:19:08.4004913Z { 2025-12-04T09:19:08.4005232Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.4005650Z "size": 108, 2025-12-04T09:19:08.4006061Z "digest": "sha256:ec36862ac98ebaac52ee1a8b1d162d45bd0e3bf59ae7e19c8f80ad3960b4c600" 2025-12-04T09:19:08.4006547Z }, 2025-12-04T09:19:08.4006808Z { 2025-12-04T09:19:08.4007121Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.4007534Z "size": 54145699, 2025-12-04T09:19:08.4008314Z "digest": "sha256:05ddbf246e8add0e293474dbf88bb028d5a295a25ac59e8648a18db644377773" 2025-12-04T09:19:08.4008968Z }, 2025-12-04T09:19:08.4009226Z { 2025-12-04T09:19:08.4009644Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:19:08.4010042Z "size": 32, 2025-12-04T09:19:08.4010433Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:19:08.4010904Z } 2025-12-04T09:19:08.4011071Z ] 2025-12-04T09:19:08.4011239Z } 2025-12-04T09:19:08.4011427Z + exit 0 2025-12-04T09:19:08.4037435Z ##[group]Run set -eux 2025-12-04T09:19:08.4037692Z set -eux 2025-12-04T09:19:08.4038086Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-12-04T09:19:08.4039391Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-12-04T09:19:08.4050107Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:08.4050474Z env: 2025-12-04T09:19:08.4050671Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:08.4050918Z ##[endgroup] 2025-12-04T09:19:08.4085074Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-12-04T09:19:08.4085811Z + jq --raw-output .SecretString 2025-12-04T09:19:08.4087199Z + jq -r .docker_hub_readonly_token 2025-12-04T09:19:08.4088926Z + docker login --username pytorchbot --password-stdin 2025-12-04T09:19:08.9911833Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:19:08.9912442Z Configure a credential helper to remove this warning. See 2025-12-04T09:19:08.9913007Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:19:08.9913411Z 2025-12-04T09:19:08.9913706Z Login Succeeded 2025-12-04T09:19:09.0043136Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T09:19:09.0043510Z tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T09:19:09.0043916Z echo "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}" 2025-12-04T09:19:09.0053126Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:09.0053493Z env: 2025-12-04T09:19:09.0053699Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:09.0054558Z ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:09.0055430Z ##[endgroup] 2025-12-04T09:19:09.0088077Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:09.0136554Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-12-04T09:19:09.0136998Z with: 2025-12-04T09:19:09.0137776Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:09.0138802Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:09.0139288Z env: 2025-12-04T09:19:09.0139482Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:09.0139733Z ##[endgroup] 2025-12-04T09:19:09.0154521Z ##[group]Run set -x 2025-12-04T09:19:09.0154784Z set -x 2025-12-04T09:19:09.0154994Z set +e 2025-12-04T09:19:09.0155202Z  2025-12-04T09:19:09.0155395Z login() { 2025-12-04T09:19:09.0155852Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T09:19:09.0156366Z } 2025-12-04T09:19:09.0156563Z  2025-12-04T09:19:09.0156783Z retry () { 2025-12-04T09:19:09.0157035Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T09:19:09.0157533Z } 2025-12-04T09:19:09.0157751Z  2025-12-04T09:19:09.0157968Z retry login "${DOCKER_REGISTRY}" 2025-12-04T09:19:09.0158259Z  2025-12-04T09:19:09.0158741Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-12-04T09:19:09.0159409Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-12-04T09:19:09.0159769Z  2025-12-04T09:19:09.0159965Z set -e 2025-12-04T09:19:09.0160297Z # ignore output since only exit code is used for conditional 2025-12-04T09:19:09.0160790Z # only pull docker image if it's not available locally 2025-12-04T09:19:09.0161321Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-12-04T09:19:09.0161832Z  retry docker pull "${DOCKER_IMAGE}" 2025-12-04T09:19:09.0162148Z fi 2025-12-04T09:19:09.0171071Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:19:09.0171452Z env: 2025-12-04T09:19:09.0171652Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:19:09.0172488Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:09.0173453Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:09.0173833Z ##[endgroup] 2025-12-04T09:19:09.0204476Z + set +e 2025-12-04T09:19:09.0204767Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:09.0205188Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:09.0209229Z + aws ecr get-login-password --region us-east-1 2025-12-04T09:19:09.0214434Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:19:09.5545405Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:19:09.5546012Z Configure a credential helper to remove this warning. See 2025-12-04T09:19:09.5546591Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:19:09.5546980Z 2025-12-04T09:19:09.5547488Z Login Succeeded 2025-12-04T09:19:09.5577708Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:09.5578731Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-12-04T09:19:09.7777658Z + IMAGE_SIZE=15091.581844329834 2025-12-04T09:19:09.7778091Z Compressed size of image in MB: 15091.581844329834 2025-12-04T09:19:09.7778514Z + echo 'Compressed size of image in MB: 15091.581844329834' 2025-12-04T09:19:09.7778881Z + set -e 2025-12-04T09:19:09.7780197Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:09.7932778Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:09.7934254Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:19:10.0336551Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image 2025-12-04T09:19:10.0339394Z 63e5bc7682b8: Pulling fs layer 2025-12-04T09:19:10.0340129Z 0678d56345c9: Pulling fs layer 2025-12-04T09:19:10.0340728Z 45f5c9ddfce7: Pulling fs layer 2025-12-04T09:19:10.0341062Z 086b1df51ac1: Pulling fs layer 2025-12-04T09:19:10.0341445Z fe8a7b64bf98: Pulling fs layer 2025-12-04T09:19:10.0341782Z 7680723e9a57: Pulling fs layer 2025-12-04T09:19:10.0342169Z 9c5027aeeb4e: Pulling fs layer 2025-12-04T09:19:10.0342557Z 9a5652110360: Pulling fs layer 2025-12-04T09:19:10.0342931Z 375c4427e914: Pulling fs layer 2025-12-04T09:19:10.0343637Z a86faaa7dbdd: Pulling fs layer 2025-12-04T09:19:10.0344003Z fb7848686804: Pulling fs layer 2025-12-04T09:19:10.0344358Z 3541df015cdb: Pulling fs layer 2025-12-04T09:19:10.0344734Z 79dc80f426b2: Pulling fs layer 2025-12-04T09:19:10.0345103Z a13fcc1b90bb: Pulling fs layer 2025-12-04T09:19:10.0345476Z 4f4fb700ef54: Pulling fs layer 2025-12-04T09:19:10.0345842Z 549db4d6c618: Pulling fs layer 2025-12-04T09:19:10.0346194Z 5c63528cb580: Pulling fs layer 2025-12-04T09:19:10.0346560Z 75bd83b989a4: Pulling fs layer 2025-12-04T09:19:10.0346922Z de6e78970f51: Pulling fs layer 2025-12-04T09:19:10.0347275Z e13ed7c7e473: Pulling fs layer 2025-12-04T09:19:10.0347600Z fe8a7b64bf98: Waiting 2025-12-04T09:19:10.0347981Z 6e2949bcb741: Pulling fs layer 2025-12-04T09:19:10.0348391Z 14d69d9aaec7: Pulling fs layer 2025-12-04T09:19:10.0348743Z 7680723e9a57: Waiting 2025-12-04T09:19:10.0349094Z 5c02769dd8e5: Pulling fs layer 2025-12-04T09:19:10.0349470Z 35041ce524ac: Pulling fs layer 2025-12-04T09:19:10.0349841Z 9c5027aeeb4e: Waiting 2025-12-04T09:19:10.0350172Z 2fa92dc5885e: Pulling fs layer 2025-12-04T09:19:10.0350535Z 2b85eafbd92a: Pulling fs layer 2025-12-04T09:19:10.0350909Z ff755a4ddad7: Pulling fs layer 2025-12-04T09:19:10.0351292Z 09eb41bdf42d: Pulling fs layer 2025-12-04T09:19:10.0351622Z 11ede4d59e93: Pulling fs layer 2025-12-04T09:19:10.0351924Z 9a5652110360: Waiting 2025-12-04T09:19:10.0352252Z 1283cd8f801a: Pulling fs layer 2025-12-04T09:19:10.0352605Z 024fa855425f: Pulling fs layer 2025-12-04T09:19:10.0352968Z 303e6747a62e: Pulling fs layer 2025-12-04T09:19:10.0353330Z 3017cdf4838b: Pulling fs layer 2025-12-04T09:19:10.0353695Z 6b6cd1c358e8: Pulling fs layer 2025-12-04T09:19:10.0354034Z a13fcc1b90bb: Waiting 2025-12-04T09:19:10.0354252Z 375c4427e914: Waiting 2025-12-04T09:19:10.0354484Z b2dd04501124: Pulling fs layer 2025-12-04T09:19:10.0354754Z 55adc51fe589: Pulling fs layer 2025-12-04T09:19:10.0355035Z a43ca0e4b837: Pulling fs layer 2025-12-04T09:19:10.0355285Z 086b1df51ac1: Waiting 2025-12-04T09:19:10.0355521Z b7212f17fd14: Pulling fs layer 2025-12-04T09:19:10.0355785Z 083e42cac090: Pulling fs layer 2025-12-04T09:19:10.0356047Z 0a00b784a4aa: Pulling fs layer 2025-12-04T09:19:10.0356294Z 303e6747a62e: Waiting 2025-12-04T09:19:10.0356522Z a86faaa7dbdd: Waiting 2025-12-04T09:19:10.0356744Z fb7848686804: Waiting 2025-12-04T09:19:10.0356960Z 5c63528cb580: Waiting 2025-12-04T09:19:10.0357179Z 3541df015cdb: Waiting 2025-12-04T09:19:10.0357416Z c6173c779f7b: Pulling fs layer 2025-12-04T09:19:10.0357660Z b7212f17fd14: Waiting 2025-12-04T09:19:10.0357891Z ed3d1e3387b9: Pulling fs layer 2025-12-04T09:19:10.0369052Z 79dc80f426b2: Waiting 2025-12-04T09:19:10.0369327Z 4f4fb700ef54: Waiting 2025-12-04T09:19:10.0369557Z 75bd83b989a4: Waiting 2025-12-04T09:19:10.0369773Z 6b6cd1c358e8: Waiting 2025-12-04T09:19:10.0369992Z 549db4d6c618: Waiting 2025-12-04T09:19:10.0370452Z b2dd04501124: Waiting 2025-12-04T09:19:10.0370693Z b29343478586: Pulling fs layer 2025-12-04T09:19:10.0370970Z c6f0520487fb: Pulling fs layer 2025-12-04T09:19:10.0371241Z 148171691cd4: Pulling fs layer 2025-12-04T09:19:10.0371489Z de6e78970f51: Waiting 2025-12-04T09:19:10.0371726Z 2c666d30ed77: Pulling fs layer 2025-12-04T09:19:10.0371983Z e13ed7c7e473: Waiting 2025-12-04T09:19:10.0372216Z 5d8d3a0a98e0: Pulling fs layer 2025-12-04T09:19:10.0372477Z 0a00b784a4aa: Waiting 2025-12-04T09:19:10.0372717Z b06bafce9e81: Pulling fs layer 2025-12-04T09:19:10.0373048Z 2b85eafbd92a: Waiting 2025-12-04T09:19:10.0373293Z 6e2949bcb741: Waiting 2025-12-04T09:19:10.0373679Z 15e0d7e4590d: Pulling fs layer 2025-12-04T09:19:10.0373945Z a514bd1add31: Pulling fs layer 2025-12-04T09:19:10.0374195Z 09eb41bdf42d: Waiting 2025-12-04T09:19:10.0374426Z 11ede4d59e93: Waiting 2025-12-04T09:19:10.0374644Z c6173c779f7b: Waiting 2025-12-04T09:19:10.0374869Z 57b84ee60002: Pulling fs layer 2025-12-04T09:19:10.0375127Z b06bafce9e81: Waiting 2025-12-04T09:19:10.0375370Z 5d8d3a0a98e0: Waiting 2025-12-04T09:19:10.0375640Z 15e0d7e4590d: Waiting 2025-12-04T09:19:10.0375896Z b8babeff6d81: Pulling fs layer 2025-12-04T09:19:10.0376266Z 2c666d30ed77: Waiting 2025-12-04T09:19:10.0376477Z ff755a4ddad7: Waiting 2025-12-04T09:19:10.0376696Z 35041ce524ac: Waiting 2025-12-04T09:19:10.0376922Z 83779ddf6a85: Pulling fs layer 2025-12-04T09:19:10.0377177Z b29343478586: Waiting 2025-12-04T09:19:10.0377394Z 57b84ee60002: Waiting 2025-12-04T09:19:10.0377623Z 8b7620c0d736: Pulling fs layer 2025-12-04T09:19:10.0377868Z 5c02769dd8e5: Waiting 2025-12-04T09:19:10.0378159Z b8babeff6d81: Waiting 2025-12-04T09:19:10.0378490Z 3bcfa090e4ef: Pulling fs layer 2025-12-04T09:19:10.0378842Z eb0504ec4d92: Pulling fs layer 2025-12-04T09:19:10.0379257Z 8b7620c0d736: Waiting 2025-12-04T09:19:10.0379558Z 83779ddf6a85: Waiting 2025-12-04T09:19:10.0379778Z 2fa92dc5885e: Waiting 2025-12-04T09:19:10.0379996Z 3bcfa090e4ef: Waiting 2025-12-04T09:19:10.0380233Z 15d0fec09d7b: Pulling fs layer 2025-12-04T09:19:10.0380498Z 148171691cd4: Waiting 2025-12-04T09:19:10.0380723Z cca81fcc62a9: Pulling fs layer 2025-12-04T09:19:10.0380982Z eb0504ec4d92: Waiting 2025-12-04T09:19:10.0381196Z 083e42cac090: Waiting 2025-12-04T09:19:10.0381412Z c6f0520487fb: Waiting 2025-12-04T09:19:10.0381634Z cca81fcc62a9: Waiting 2025-12-04T09:19:10.0381871Z b0b8f9b5c6ab: Pulling fs layer 2025-12-04T09:19:10.0382129Z a514bd1add31: Waiting 2025-12-04T09:19:10.0382440Z 0606ca4d47a8: Pulling fs layer 2025-12-04T09:19:10.0382748Z 2f80a4e1b3b9: Pulling fs layer 2025-12-04T09:19:10.0383006Z 35c916fb1bd0: Pulling fs layer 2025-12-04T09:19:10.0383276Z 195537b7dafc: Pulling fs layer 2025-12-04T09:19:10.0383523Z 14d69d9aaec7: Waiting 2025-12-04T09:19:10.0383748Z dc454fd3967e: Pulling fs layer 2025-12-04T09:19:10.0384004Z 2f80a4e1b3b9: Waiting 2025-12-04T09:19:10.0384235Z 701b34f115fa: Pulling fs layer 2025-12-04T09:19:10.0384501Z 39cefc00ffed: Pulling fs layer 2025-12-04T09:19:10.0384792Z 6ae51eb61a32: Pulling fs layer 2025-12-04T09:19:10.0385066Z dc454fd3967e: Waiting 2025-12-04T09:19:10.0385285Z 701b34f115fa: Waiting 2025-12-04T09:19:10.0385498Z b0b8f9b5c6ab: Waiting 2025-12-04T09:19:10.0385724Z 39cefc00ffed: Waiting 2025-12-04T09:19:10.0385952Z 1fd5341e66df: Pulling fs layer 2025-12-04T09:19:10.0386201Z ed3d1e3387b9: Waiting 2025-12-04T09:19:10.0386427Z 6ae51eb61a32: Waiting 2025-12-04T09:19:10.0386655Z 72a7c87e35e4: Pulling fs layer 2025-12-04T09:19:10.0386898Z 0606ca4d47a8: Waiting 2025-12-04T09:19:10.0387112Z 1fd5341e66df: Waiting 2025-12-04T09:19:10.0387349Z ec36862ac98e: Pulling fs layer 2025-12-04T09:19:10.0387597Z 35c916fb1bd0: Waiting 2025-12-04T09:19:10.0387836Z 05ddbf246e8a: Pulling fs layer 2025-12-04T09:19:10.0388093Z ec36862ac98e: Waiting 2025-12-04T09:19:10.0388306Z 05ddbf246e8a: Waiting 2025-12-04T09:19:10.0388525Z 195537b7dafc: Waiting 2025-12-04T09:19:10.0388745Z 55adc51fe589: Waiting 2025-12-04T09:19:10.0388953Z 024fa855425f: Waiting 2025-12-04T09:19:10.0389171Z 1283cd8f801a: Waiting 2025-12-04T09:19:10.0389386Z 3017cdf4838b: Waiting 2025-12-04T09:19:10.1112927Z 0678d56345c9: Verifying Checksum 2025-12-04T09:19:10.1113370Z 0678d56345c9: Download complete 2025-12-04T09:19:10.1987221Z 086b1df51ac1: Verifying Checksum 2025-12-04T09:19:10.1987566Z 086b1df51ac1: Download complete 2025-12-04T09:19:10.2709975Z fe8a7b64bf98: Download complete 2025-12-04T09:19:10.3366706Z 7680723e9a57: Verifying Checksum 2025-12-04T09:19:10.3367163Z 7680723e9a57: Download complete 2025-12-04T09:19:10.3991806Z 63e5bc7682b8: Verifying Checksum 2025-12-04T09:19:10.3992288Z 63e5bc7682b8: Download complete 2025-12-04T09:19:10.4140939Z 9c5027aeeb4e: Verifying Checksum 2025-12-04T09:19:10.4154760Z 9c5027aeeb4e: Download complete 2025-12-04T09:19:10.4583590Z 9a5652110360: Download complete 2025-12-04T09:19:10.5312490Z a86faaa7dbdd: Verifying Checksum 2025-12-04T09:19:10.5312836Z a86faaa7dbdd: Download complete 2025-12-04T09:19:10.6046596Z fb7848686804: Download complete 2025-12-04T09:19:10.6751391Z 3541df015cdb: Verifying Checksum 2025-12-04T09:19:10.6751772Z 3541df015cdb: Download complete 2025-12-04T09:19:10.7356489Z 79dc80f426b2: Verifying Checksum 2025-12-04T09:19:10.7356819Z 79dc80f426b2: Download complete 2025-12-04T09:19:11.5660755Z 375c4427e914: Verifying Checksum 2025-12-04T09:19:11.5661220Z 375c4427e914: Download complete 2025-12-04T09:19:11.5748424Z 4f4fb700ef54: Verifying Checksum 2025-12-04T09:19:11.5748760Z 4f4fb700ef54: Download complete 2025-12-04T09:19:11.6316163Z 63e5bc7682b8: Pull complete 2025-12-04T09:19:11.6440279Z 549db4d6c618: Verifying Checksum 2025-12-04T09:19:11.6440612Z 549db4d6c618: Download complete 2025-12-04T09:19:11.6549321Z 0678d56345c9: Pull complete 2025-12-04T09:19:11.7542980Z 5c63528cb580: Download complete 2025-12-04T09:19:11.8643820Z 75bd83b989a4: Verifying Checksum 2025-12-04T09:19:11.8644113Z 75bd83b989a4: Download complete 2025-12-04T09:19:11.9792253Z de6e78970f51: Verifying Checksum 2025-12-04T09:19:11.9792709Z de6e78970f51: Download complete 2025-12-04T09:19:12.0519732Z e13ed7c7e473: Download complete 2025-12-04T09:19:12.1082669Z 6e2949bcb741: Verifying Checksum 2025-12-04T09:19:12.1082968Z 6e2949bcb741: Download complete 2025-12-04T09:19:12.2047661Z 14d69d9aaec7: Verifying Checksum 2025-12-04T09:19:12.2047948Z 14d69d9aaec7: Download complete 2025-12-04T09:19:12.2972458Z 5c02769dd8e5: Verifying Checksum 2025-12-04T09:19:12.2972747Z 5c02769dd8e5: Download complete 2025-12-04T09:19:13.2106194Z 45f5c9ddfce7: Verifying Checksum 2025-12-04T09:19:13.2106528Z 45f5c9ddfce7: Download complete 2025-12-04T09:19:13.2820869Z 2fa92dc5885e: Verifying Checksum 2025-12-04T09:19:13.2821214Z 2fa92dc5885e: Download complete 2025-12-04T09:19:13.6635879Z 2b85eafbd92a: Verifying Checksum 2025-12-04T09:19:13.6636216Z 2b85eafbd92a: Download complete 2025-12-04T09:19:13.7565460Z ff755a4ddad7: Verifying Checksum 2025-12-04T09:19:13.7565923Z ff755a4ddad7: Download complete 2025-12-04T09:19:13.8266209Z 09eb41bdf42d: Verifying Checksum 2025-12-04T09:19:13.8266621Z 09eb41bdf42d: Download complete 2025-12-04T09:19:18.4581359Z 11ede4d59e93: Verifying Checksum 2025-12-04T09:19:18.4581792Z 11ede4d59e93: Download complete 2025-12-04T09:19:18.5360581Z 1283cd8f801a: Verifying Checksum 2025-12-04T09:19:18.5361084Z 1283cd8f801a: Download complete 2025-12-04T09:19:18.6186145Z 024fa855425f: Verifying Checksum 2025-12-04T09:19:18.6186630Z 024fa855425f: Download complete 2025-12-04T09:19:18.7141705Z 303e6747a62e: Download complete 2025-12-04T09:19:18.8105478Z 3017cdf4838b: Download complete 2025-12-04T09:19:19.0509326Z 6b6cd1c358e8: Verifying Checksum 2025-12-04T09:19:19.0509674Z 6b6cd1c358e8: Download complete 2025-12-04T09:19:19.1290624Z b2dd04501124: Verifying Checksum 2025-12-04T09:19:19.1290948Z b2dd04501124: Download complete 2025-12-04T09:19:19.2212522Z 55adc51fe589: Verifying Checksum 2025-12-04T09:19:19.2212871Z 55adc51fe589: Download complete 2025-12-04T09:19:19.2954066Z a43ca0e4b837: Verifying Checksum 2025-12-04T09:19:19.2954500Z a43ca0e4b837: Download complete 2025-12-04T09:19:19.3730794Z b7212f17fd14: Verifying Checksum 2025-12-04T09:19:19.3731610Z b7212f17fd14: Download complete 2025-12-04T09:19:19.4689481Z 083e42cac090: Verifying Checksum 2025-12-04T09:19:19.4689912Z 083e42cac090: Download complete 2025-12-04T09:19:19.5530578Z 0a00b784a4aa: Verifying Checksum 2025-12-04T09:19:19.5530919Z 0a00b784a4aa: Download complete 2025-12-04T09:19:19.6343747Z c6173c779f7b: Download complete 2025-12-04T09:19:21.1438209Z ed3d1e3387b9: Verifying Checksum 2025-12-04T09:19:21.1438557Z ed3d1e3387b9: Download complete 2025-12-04T09:19:21.2234376Z b29343478586: Verifying Checksum 2025-12-04T09:19:21.2234817Z b29343478586: Download complete 2025-12-04T09:19:22.9228479Z 45f5c9ddfce7: Pull complete 2025-12-04T09:19:23.0184336Z 086b1df51ac1: Pull complete 2025-12-04T09:19:23.1142588Z fe8a7b64bf98: Pull complete 2025-12-04T09:19:23.1958554Z 7680723e9a57: Pull complete 2025-12-04T09:19:23.4051959Z 9c5027aeeb4e: Pull complete 2025-12-04T09:19:23.6305179Z 9a5652110360: Pull complete 2025-12-04T09:19:24.4094502Z c6f0520487fb: Verifying Checksum 2025-12-04T09:19:24.4094999Z c6f0520487fb: Download complete 2025-12-04T09:19:26.3061219Z 375c4427e914: Pull complete 2025-12-04T09:19:26.5161542Z a86faaa7dbdd: Pull complete 2025-12-04T09:19:26.7282534Z fb7848686804: Pull complete 2025-12-04T09:19:26.9268436Z 3541df015cdb: Pull complete 2025-12-04T09:19:27.1275372Z 79dc80f426b2: Pull complete 2025-12-04T09:19:42.6380951Z a13fcc1b90bb: Verifying Checksum 2025-12-04T09:19:42.6381416Z a13fcc1b90bb: Download complete 2025-12-04T09:19:42.7411886Z 2c666d30ed77: Verifying Checksum 2025-12-04T09:19:42.7412325Z 2c666d30ed77: Download complete 2025-12-04T09:19:42.8109852Z 5d8d3a0a98e0: Verifying Checksum 2025-12-04T09:19:42.8110565Z 5d8d3a0a98e0: Download complete 2025-12-04T09:19:42.8794555Z b06bafce9e81: Verifying Checksum 2025-12-04T09:19:42.8794958Z b06bafce9e81: Download complete 2025-12-04T09:19:42.9523808Z 15e0d7e4590d: Download complete 2025-12-04T09:19:43.0345089Z a514bd1add31: Verifying Checksum 2025-12-04T09:19:43.0345531Z a514bd1add31: Download complete 2025-12-04T09:19:43.1341843Z 57b84ee60002: Verifying Checksum 2025-12-04T09:19:43.1342224Z 57b84ee60002: Download complete 2025-12-04T09:19:43.2338446Z b8babeff6d81: Verifying Checksum 2025-12-04T09:19:43.2338943Z b8babeff6d81: Download complete 2025-12-04T09:19:43.3023835Z 83779ddf6a85: Verifying Checksum 2025-12-04T09:19:43.3024215Z 83779ddf6a85: Download complete 2025-12-04T09:19:43.3743007Z 8b7620c0d736: Download complete 2025-12-04T09:19:43.4591293Z 3bcfa090e4ef: Verifying Checksum 2025-12-04T09:19:43.4591674Z 3bcfa090e4ef: Download complete 2025-12-04T09:19:43.5421198Z eb0504ec4d92: Download complete 2025-12-04T09:19:43.6280066Z 15d0fec09d7b: Verifying Checksum 2025-12-04T09:19:43.6280502Z 15d0fec09d7b: Download complete 2025-12-04T09:19:43.7248867Z cca81fcc62a9: Verifying Checksum 2025-12-04T09:19:43.7249208Z cca81fcc62a9: Download complete 2025-12-04T09:19:43.8131312Z b0b8f9b5c6ab: Verifying Checksum 2025-12-04T09:19:43.8132374Z b0b8f9b5c6ab: Download complete 2025-12-04T09:19:43.8986912Z 0606ca4d47a8: Download complete 2025-12-04T09:19:43.9769337Z 2f80a4e1b3b9: Verifying Checksum 2025-12-04T09:19:43.9769653Z 2f80a4e1b3b9: Download complete 2025-12-04T09:19:44.0347513Z 35c916fb1bd0: Verifying Checksum 2025-12-04T09:19:44.0347864Z 35c916fb1bd0: Download complete 2025-12-04T09:19:46.0474952Z 195537b7dafc: Verifying Checksum 2025-12-04T09:19:46.0475285Z 195537b7dafc: Download complete 2025-12-04T09:19:46.1293413Z dc454fd3967e: Download complete 2025-12-04T09:19:46.2151110Z 701b34f115fa: Verifying Checksum 2025-12-04T09:19:46.2151551Z 701b34f115fa: Download complete 2025-12-04T09:19:46.2858922Z 39cefc00ffed: Download complete 2025-12-04T09:19:46.3715397Z 6ae51eb61a32: Verifying Checksum 2025-12-04T09:19:46.3715735Z 6ae51eb61a32: Download complete 2025-12-04T09:19:46.4663726Z 1fd5341e66df: Verifying Checksum 2025-12-04T09:19:46.4664082Z 1fd5341e66df: Download complete 2025-12-04T09:19:46.6602909Z 72a7c87e35e4: Verifying Checksum 2025-12-04T09:19:46.6603342Z 72a7c87e35e4: Download complete 2025-12-04T09:19:46.7247407Z ec36862ac98e: Download complete 2025-12-04T09:19:47.3177908Z 05ddbf246e8a: Verifying Checksum 2025-12-04T09:19:47.3178863Z 05ddbf246e8a: Download complete 2025-12-04T09:19:55.0321359Z 148171691cd4: Verifying Checksum 2025-12-04T09:19:55.0321713Z 148171691cd4: Download complete 2025-12-04T09:20:31.5538197Z 35041ce524ac: Verifying Checksum 2025-12-04T09:20:31.5538623Z 35041ce524ac: Download complete 2025-12-04T09:21:06.1079227Z a13fcc1b90bb: Pull complete 2025-12-04T09:21:06.3286269Z 4f4fb700ef54: Pull complete 2025-12-04T09:21:06.5405864Z 549db4d6c618: Pull complete 2025-12-04T09:21:06.8044012Z 5c63528cb580: Pull complete 2025-12-04T09:21:07.0202322Z 75bd83b989a4: Pull complete 2025-12-04T09:21:07.3236364Z de6e78970f51: Pull complete 2025-12-04T09:21:07.4639602Z e13ed7c7e473: Pull complete 2025-12-04T09:21:07.5994021Z 6e2949bcb741: Pull complete 2025-12-04T09:21:07.6775196Z 14d69d9aaec7: Pull complete 2025-12-04T09:21:07.8481633Z 5c02769dd8e5: Pull complete 2025-12-04T09:22:39.2137886Z 35041ce524ac: Pull complete 2025-12-04T09:22:39.4206554Z 2fa92dc5885e: Pull complete 2025-12-04T09:22:40.2673255Z 2b85eafbd92a: Pull complete 2025-12-04T09:22:40.4770511Z ff755a4ddad7: Pull complete 2025-12-04T09:22:40.6812386Z 09eb41bdf42d: Pull complete 2025-12-04T09:22:49.4302225Z 11ede4d59e93: Pull complete 2025-12-04T09:22:49.6407347Z 1283cd8f801a: Pull complete 2025-12-04T09:22:49.8454793Z 024fa855425f: Pull complete 2025-12-04T09:22:50.2823010Z 303e6747a62e: Pull complete 2025-12-04T09:22:50.4962300Z 3017cdf4838b: Pull complete 2025-12-04T09:22:50.9099480Z 6b6cd1c358e8: Pull complete 2025-12-04T09:22:51.1191903Z b2dd04501124: Pull complete 2025-12-04T09:22:51.3277095Z 55adc51fe589: Pull complete 2025-12-04T09:22:51.7673707Z a43ca0e4b837: Pull complete 2025-12-04T09:22:51.9938981Z b7212f17fd14: Pull complete 2025-12-04T09:22:52.2073946Z 083e42cac090: Pull complete 2025-12-04T09:22:52.6481425Z 0a00b784a4aa: Pull complete 2025-12-04T09:22:52.8653805Z c6173c779f7b: Pull complete 2025-12-04T09:22:56.6709554Z ed3d1e3387b9: Pull complete 2025-12-04T09:22:56.8891227Z b29343478586: Pull complete 2025-12-04T09:22:58.3145897Z c6f0520487fb: Pull complete 2025-12-04T09:23:59.5182792Z 148171691cd4: Pull complete 2025-12-04T09:23:59.5927006Z 2c666d30ed77: Pull complete 2025-12-04T09:23:59.7234329Z 5d8d3a0a98e0: Pull complete 2025-12-04T09:23:59.9560280Z b06bafce9e81: Pull complete 2025-12-04T09:24:00.4018162Z 15e0d7e4590d: Pull complete 2025-12-04T09:24:00.5988535Z a514bd1add31: Pull complete 2025-12-04T09:24:01.0067901Z 57b84ee60002: Pull complete 2025-12-04T09:24:01.4286387Z b8babeff6d81: Pull complete 2025-12-04T09:24:01.6440834Z 83779ddf6a85: Pull complete 2025-12-04T09:24:02.0596479Z 8b7620c0d736: Pull complete 2025-12-04T09:24:02.3944507Z 3bcfa090e4ef: Pull complete 2025-12-04T09:24:02.5363985Z eb0504ec4d92: Pull complete 2025-12-04T09:24:02.7512373Z 15d0fec09d7b: Pull complete 2025-12-04T09:24:02.8952034Z cca81fcc62a9: Pull complete 2025-12-04T09:24:03.1778494Z b0b8f9b5c6ab: Pull complete 2025-12-04T09:24:03.3862628Z 0606ca4d47a8: Pull complete 2025-12-04T09:24:03.5430395Z 2f80a4e1b3b9: Pull complete 2025-12-04T09:24:03.5783340Z 35c916fb1bd0: Pull complete 2025-12-04T09:24:10.4119673Z 195537b7dafc: Pull complete 2025-12-04T09:24:10.6210457Z dc454fd3967e: Pull complete 2025-12-04T09:24:10.8358793Z 701b34f115fa: Pull complete 2025-12-04T09:24:11.0581011Z 39cefc00ffed: Pull complete 2025-12-04T09:24:11.2286198Z 6ae51eb61a32: Pull complete 2025-12-04T09:24:11.3612991Z 1fd5341e66df: Pull complete 2025-12-04T09:24:13.2011268Z 72a7c87e35e4: Pull complete 2025-12-04T09:24:13.4104513Z ec36862ac98e: Pull complete 2025-12-04T09:24:15.2256258Z 05ddbf246e8a: Pull complete 2025-12-04T09:24:15.5764513Z Digest: sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97 2025-12-04T09:24:15.6187607Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:24:15.6371344Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:24:15.6455273Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:24:15.6456438Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:24:15.6467484Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:24:15.6467872Z env: 2025-12-04T09:24:15.6468085Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:24:15.6468348Z ##[endgroup] 2025-12-04T09:24:15.6666918Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2025-12-04T09:24:15.6667334Z with: 2025-12-04T09:24:15.6667553Z driver-version: 580.82.07 2025-12-04T09:24:15.6667802Z env: 2025-12-04T09:24:15.6668008Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:24:15.6668267Z ##[endgroup] 2025-12-04T09:24:15.6786284Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:24:15.6787418Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:24:15.6796644Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:24:15.6797007Z env: 2025-12-04T09:24:15.6797210Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:24:15.6797446Z ##[endgroup] 2025-12-04T09:24:15.6877624Z ##[group]Run set -euo pipefail 2025-12-04T09:24:15.6877973Z set -euo pipefail 2025-12-04T09:24:15.6878275Z  2025-12-04T09:24:15.6878471Z has_gpu=false 2025-12-04T09:24:15.6878715Z devices="" 2025-12-04T09:24:15.6878932Z  2025-12-04T09:24:15.6879189Z if command -v nvidia-smi >/dev/null 2>&1; then 2025-12-04T09:24:15.6879631Z  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:24:15.6880009Z  has_gpu=true 2025-12-04T09:24:15.6880292Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:24:15.6880595Z  fi 2025-12-04T09:24:15.6880803Z fi 2025-12-04T09:24:15.6880992Z  2025-12-04T09:24:15.6881202Z if [ "$has_gpu" = false ]; then 2025-12-04T09:24:15.6881592Z  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:24:15.6881967Z  has_gpu=true 2025-12-04T09:24:15.6882249Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:24:15.6882549Z  fi 2025-12-04T09:24:15.6882745Z fi 2025-12-04T09:24:15.6882943Z  2025-12-04T09:24:15.6883240Z if [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then 2025-12-04T09:24:15.6883741Z  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:24:15.6884147Z  has_gpu=true 2025-12-04T09:24:15.6884422Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:24:15.6884727Z  fi 2025-12-04T09:24:15.6884922Z fi 2025-12-04T09:24:15.6885111Z  2025-12-04T09:24:15.6885411Z printf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT" 2025-12-04T09:24:15.6885944Z printf 'DETECTED_DEVICES<> "$GITHUB_OUTPUT" 2025-12-04T09:24:15.6894381Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:24:15.6894737Z env: 2025-12-04T09:24:15.6894941Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:24:15.6895188Z ##[endgroup] 2025-12-04T09:24:17.4378890Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:24:17.4379625Z if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:24:17.4380193Z  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}" 2025-12-04T09:24:17.4380954Z  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-12-04T09:24:17.4381642Z else 2025-12-04T09:24:17.4382052Z  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}" 2025-12-04T09:24:17.4382577Z fi 2025-12-04T09:24:17.4395396Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:24:17.4395976Z env: 2025-12-04T09:24:17.4396295Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:24:17.4396924Z HAS_NVIDIA: true 2025-12-04T09:24:17.4397252Z ##[endgroup] 2025-12-04T09:24:17.4482662Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2025-12-04T09:24:17.4483084Z with: 2025-12-04T09:24:17.4483280Z timeout_minutes: 10 2025-12-04T09:24:17.4483518Z max_attempts: 3 2025-12-04T09:24:17.4512320Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo" else # Amazon Linux 2 YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" fi sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y \ nvidia-container-toolkit-1.17.8 \ libnvidia-container-tools-1.17.8 \ libnvidia-container1-1.17.8 \ nvidia-container-toolkit-base-1.17.8 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x # Install nvidia-driver package if not installed status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)" if [ ! $? = 0 ] || [ ! "$status" = installed ]; then sudo apt-get install -y nvidia-container-toolkit-1.17.8 sudo systemctl restart docker fi ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" # Turn off persistent mode so that the installation script can unload the kernel module sudo killall nvidia-persistenced || true else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in # the case where the driver has already crashed as it still can get the driver version # and some basic information like the bus ID. However, the rest of the information # would be missing (ERR!), for example: # # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # | | | MIG M. | # |===============================+======================+======================| # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | # | | | ERR! | # +-------------------------------+----------------------+----------------------+ # # +-----------------------------------------------------------------------------+ # | Processes: | # | GPU GI CI PID Type Process name GPU Memory | # | ID ID Usage | # |=============================================================================| # +-----------------------------------------------------------------------------+ # # This should be reported as a failure instead as it will guarantee to fail when # Docker tries to run with --gpus all # # So, the correct check here is to query one of the missing piece of info like # GPU name, so that the command can fail accordingly nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with # more than one GPUs. This just needs to be run once. The command fails # on subsequent runs and complains that the mode is already on, but that's # ok sudo nvidia-persistenced || true # This should show persistence mode ON nvidia-smi # check if the container-toolkit is correctly installed and CUDA is available inside a container docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi 2025-12-04T09:24:17.4540822Z retry_wait_seconds: 10 2025-12-04T09:24:17.4541088Z polling_interval_seconds: 1 2025-12-04T09:24:17.4541357Z warning_on_retry: true 2025-12-04T09:24:17.4541627Z continue_on_error: false 2025-12-04T09:24:17.4541872Z env: 2025-12-04T09:24:17.4542073Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:24:17.4542332Z HAS_NVIDIA_GPU: true 2025-12-04T09:24:17.4542639Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:24:17.4543000Z DRIVER_VERSION: 580.82.07 2025-12-04T09:24:17.4543261Z ##[endgroup] 2025-12-04T09:24:17.5743601Z == Installing nvidia driver NVIDIA-Linux-x86_64-580.82.07.run == 2025-12-04T09:24:17.5744459Z + pre_install_nvidia_driver_amzn2 2025-12-04T09:24:17.5746740Z + sudo yum remove -y nvidia-driver-latest-dkms 2025-12-04T09:24:18.2276428Z No match for argument: nvidia-driver-latest-dkms 2025-12-04T09:24:18.2276831Z No packages marked for removal. 2025-12-04T09:24:18.2342061Z Dependencies resolved. 2025-12-04T09:24:18.2352016Z Nothing to do. 2025-12-04T09:24:18.2352670Z Complete! 2025-12-04T09:24:18.2994144Z + install_nvidia_driver_common 2025-12-04T09:24:18.2998503Z + echo 'Before installing NVIDIA driver' 2025-12-04T09:24:18.2999153Z + lspci 2025-12-04T09:24:18.3000701Z Before installing NVIDIA driver 2025-12-04T09:24:18.4340337Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:24:18.4341017Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:24:18.4341613Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:24:18.4342180Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:24:18.4342680Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:24:18.4343269Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:24:18.4343784Z 00:1e.0 3D controller: NVIDIA Corporation GA102GL [A10G] (rev a1) 2025-12-04T09:24:18.4344332Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:24:18.4344772Z + lsmod 2025-12-04T09:24:18.4398025Z Module Size Used by 2025-12-04T09:24:18.4398805Z nvidia_uvm 1925120 0 2025-12-04T09:24:18.4399100Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:24:18.4399391Z drm 602112 1 nvidia 2025-12-04T09:24:18.4399705Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:24:18.4400108Z backlight 24576 1 drm 2025-12-04T09:24:18.4400495Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:24:18.4400862Z xt_conntrack 16384 1 2025-12-04T09:24:18.4401128Z nft_chain_nat 16384 3 2025-12-04T09:24:18.4401664Z xt_MASQUERADE 20480 1 2025-12-04T09:24:18.4402073Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:24:18.4402524Z nf_conntrack_netlink 57344 0 2025-12-04T09:24:18.4403039Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:24:18.4403498Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:24:18.4403816Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:24:18.4404120Z xfrm_user 57344 1 2025-12-04T09:24:18.4404379Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:24:18.4404672Z xt_addrtype 16384 2 2025-12-04T09:24:18.4404930Z nft_compat 20480 4 2025-12-04T09:24:18.4405232Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:24:18.4405669Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:24:18.4406066Z br_netfilter 36864 0 2025-12-04T09:24:18.4406337Z bridge 323584 1 br_netfilter 2025-12-04T09:24:18.4406647Z stp 16384 1 bridge 2025-12-04T09:24:18.4406939Z llc 16384 2 bridge,stp 2025-12-04T09:24:18.4407226Z overlay 167936 0 2025-12-04T09:24:18.4407473Z tls 139264 0 2025-12-04T09:24:18.4407981Z nls_ascii 16384 1 2025-12-04T09:24:18.4408255Z nls_cp437 20480 1 2025-12-04T09:24:18.4408495Z vfat 24576 1 2025-12-04T09:24:18.4408774Z fat 86016 1 vfat 2025-12-04T09:24:18.4409079Z sunrpc 700416 1 2025-12-04T09:24:18.4409318Z i8042 45056 0 2025-12-04T09:24:18.4409574Z ghash_clmulni_intel 16384 0 2025-12-04T09:24:18.4409836Z serio 28672 3 i8042 2025-12-04T09:24:18.4410097Z ena 184320 0 2025-12-04T09:24:18.4410346Z button 24576 0 2025-12-04T09:24:18.4410597Z sch_fq_codel 20480 17 2025-12-04T09:24:18.4410843Z fuse 184320 1 2025-12-04T09:24:18.4411092Z loop 36864 0 2025-12-04T09:24:18.4411339Z dm_mod 188416 0 2025-12-04T09:24:18.4411586Z configfs 57344 1 2025-12-04T09:24:18.4411831Z dmi_sysfs 20480 0 2025-12-04T09:24:18.4412082Z crc32_pclmul 16384 0 2025-12-04T09:24:18.4412332Z crc32c_intel 24576 0 2025-12-04T09:24:18.4412573Z efivarfs 24576 1 2025-12-04T09:24:18.4412827Z + modinfo nvidia 2025-12-04T09:24:18.4420470Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:24:18.4421139Z import_ns: DMA_BUF 2025-12-04T09:24:18.4421460Z alias: char-major-195-* 2025-12-04T09:24:18.4421794Z version: 580.82.07 2025-12-04T09:24:18.4422036Z supported: external 2025-12-04T09:24:18.4422275Z license: Dual MIT/GPL 2025-12-04T09:24:18.4422567Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:24:18.4423027Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:24:18.4423475Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:24:18.4423916Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:24:18.4424279Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:24:18.4424632Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:24:18.4424980Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:24:18.4425329Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:24:18.4425859Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:24:18.4426198Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:24:18.4426604Z depends: i2c-core,drm 2025-12-04T09:24:18.4426954Z retpoline: Y 2025-12-04T09:24:18.4427232Z name: nvidia 2025-12-04T09:24:18.4427682Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:24:18.4428187Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:24:18.4428682Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:24:18.4429280Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:24:18.4429599Z parm: NVreg_RmLogonRC:int 2025-12-04T09:24:18.4429976Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:24:18.4430406Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:24:18.4430818Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:24:18.4431231Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:24:18.4431625Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:24:18.4432029Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:24:18.4432370Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:24:18.4432676Z parm: NVreg_EnableMSI:int 2025-12-04T09:24:18.4432981Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:24:18.4433359Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:24:18.4433895Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:24:18.4434416Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:24:18.4434879Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:24:18.4435302Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:24:18.4435735Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:24:18.4436157Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:24:18.4436507Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:24:18.4436889Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:24:18.4437268Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:24:18.4437617Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:24:18.4437946Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:24:18.4438283Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:24:18.4438637Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:24:18.4438982Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:24:18.4439339Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:24:18.4439705Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:24:18.4440074Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:24:18.4440444Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:24:18.4440781Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:24:18.4441133Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:24:18.4441502Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:24:18.4441856Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:24:18.4442204Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:24:18.4442552Z parm: NVreg_RmMsg:charp 2025-12-04T09:24:18.4442845Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:24:18.4443169Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:24:18.4443503Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:24:18.4454510Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:24:18.4454918Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:24:18.4455298Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:24:18.4455661Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:24:18.4456001Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:24:18.4456359Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:24:18.4456705Z parm: rm_firmware_active:charp 2025-12-04T09:24:18.4457139Z + HAS_NVIDIA_DRIVER=0 2025-12-04T09:24:18.4457387Z ++ command -v nvidia-smi 2025-12-04T09:24:18.4457641Z + '[' -x /usr/bin/nvidia-smi ']' 2025-12-04T09:24:18.4457898Z + set +e 2025-12-04T09:24:18.4458216Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-12-04T09:24:20.1761839Z + INSTALLED_DRIVER_VERSION=580.82.07 2025-12-04T09:24:20.1762323Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:24:20.1762644Z + '[' 0 -ne 0 ']' 2025-12-04T09:24:20.1762956Z + '[' 580.82.07 '!=' 580.82.07 ']' 2025-12-04T09:24:20.1763686Z + HAS_NVIDIA_DRIVER=1 2025-12-04T09:24:20.1764229Z + echo 'NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation' 2025-12-04T09:24:20.1764730Z + set -e 2025-12-04T09:24:20.1764917Z + '[' 1 -eq 0 ']' 2025-12-04T09:24:20.1765326Z NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation 2025-12-04T09:24:20.1767265Z + post_install_nvidia_driver_common 2025-12-04T09:24:20.1770591Z + sudo modprobe nvidia 2025-12-04T09:24:20.3014714Z + echo 'After installing NVIDIA driver' 2025-12-04T09:24:20.3015185Z + lspci 2025-12-04T09:24:20.3015415Z After installing NVIDIA driver 2025-12-04T09:24:20.3139636Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:24:20.3140790Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:24:20.3141860Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:24:20.3142875Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:24:20.3143905Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:24:20.3144942Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:24:20.3146192Z 00:1e.0 3D controller: NVIDIA Corporation GA102GL [A10G] (rev a1) 2025-12-04T09:24:20.3147106Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:24:20.3147874Z + lsmod 2025-12-04T09:24:20.3184674Z Module Size Used by 2025-12-04T09:24:20.3185081Z nvidia_uvm 1925120 0 2025-12-04T09:24:20.3185439Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:24:20.3185846Z drm 602112 1 nvidia 2025-12-04T09:24:20.3186236Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:24:20.3186554Z backlight 24576 1 drm 2025-12-04T09:24:20.3186889Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:24:20.3187294Z xt_conntrack 16384 1 2025-12-04T09:24:20.3187654Z nft_chain_nat 16384 3 2025-12-04T09:24:20.3188009Z xt_MASQUERADE 20480 1 2025-12-04T09:24:20.3188418Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:24:20.3188781Z nf_conntrack_netlink 57344 0 2025-12-04T09:24:20.3189196Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:24:20.3189665Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:24:20.3189992Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:24:20.3190287Z xfrm_user 57344 1 2025-12-04T09:24:20.3190556Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:24:20.3190852Z xt_addrtype 16384 2 2025-12-04T09:24:20.3191108Z nft_compat 20480 4 2025-12-04T09:24:20.3191418Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:24:20.3191857Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:24:20.3192251Z br_netfilter 36864 0 2025-12-04T09:24:20.3192530Z bridge 323584 1 br_netfilter 2025-12-04T09:24:20.3192834Z stp 16384 1 bridge 2025-12-04T09:24:20.3193128Z llc 16384 2 bridge,stp 2025-12-04T09:24:20.3193423Z overlay 167936 0 2025-12-04T09:24:20.3193671Z tls 139264 0 2025-12-04T09:24:20.3193919Z nls_ascii 16384 1 2025-12-04T09:24:20.3194192Z nls_cp437 20480 1 2025-12-04T09:24:20.3194722Z vfat 24576 1 2025-12-04T09:24:20.3194988Z fat 86016 1 vfat 2025-12-04T09:24:20.3195264Z sunrpc 700416 1 2025-12-04T09:24:20.3195504Z i8042 45056 0 2025-12-04T09:24:20.3195757Z ghash_clmulni_intel 16384 0 2025-12-04T09:24:20.3196019Z serio 28672 3 i8042 2025-12-04T09:24:20.3196290Z ena 184320 0 2025-12-04T09:24:20.3196529Z button 24576 0 2025-12-04T09:24:20.3196780Z sch_fq_codel 20480 17 2025-12-04T09:24:20.3197191Z fuse 184320 1 2025-12-04T09:24:20.3197427Z loop 36864 0 2025-12-04T09:24:20.3197675Z dm_mod 188416 0 2025-12-04T09:24:20.3197922Z configfs 57344 1 2025-12-04T09:24:20.3198162Z dmi_sysfs 20480 0 2025-12-04T09:24:20.3198416Z crc32_pclmul 16384 0 2025-12-04T09:24:20.3198671Z crc32c_intel 24576 0 2025-12-04T09:24:20.3198920Z efivarfs 24576 1 2025-12-04T09:24:20.3199203Z + modinfo nvidia 2025-12-04T09:24:20.3204365Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:24:20.3205034Z import_ns: DMA_BUF 2025-12-04T09:24:20.3205363Z alias: char-major-195-* 2025-12-04T09:24:20.3205667Z version: 580.82.07 2025-12-04T09:24:20.3205909Z supported: external 2025-12-04T09:24:20.3206154Z license: Dual MIT/GPL 2025-12-04T09:24:20.3206432Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:24:20.3206786Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:24:20.3207115Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:24:20.3207442Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:24:20.3208047Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:24:20.3208406Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:24:20.3208756Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:24:20.3209127Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:24:20.3209521Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:24:20.3209984Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:24:20.3210403Z depends: i2c-core,drm 2025-12-04T09:24:20.3210740Z retpoline: Y 2025-12-04T09:24:20.3211032Z name: nvidia 2025-12-04T09:24:20.3211418Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:24:20.3211917Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:24:20.3212393Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:24:20.3212829Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:24:20.3213147Z parm: NVreg_RmLogonRC:int 2025-12-04T09:24:20.3213456Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:24:20.3213775Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:24:20.3214086Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:24:20.3214403Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:24:20.3214775Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:24:20.3215171Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:24:20.3215515Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:24:20.3215825Z parm: NVreg_EnableMSI:int 2025-12-04T09:24:20.3216128Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:24:20.3216508Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:24:20.3216929Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:24:20.3217317Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:24:20.3217751Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:24:20.3218176Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:24:20.3218609Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:24:20.3219302Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:24:20.3219651Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:24:20.3220030Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:24:20.3220411Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:24:20.3220761Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:24:20.3221086Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:24:20.3221419Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:24:20.3221749Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:24:20.3222182Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:24:20.3222536Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:24:20.3222901Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:24:20.3223263Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:24:20.3223635Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:24:20.3223972Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:24:20.3224328Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:24:20.3224693Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:24:20.3225039Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:24:20.3225384Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:24:20.3225723Z parm: NVreg_RmMsg:charp 2025-12-04T09:24:20.3226012Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:24:20.3226341Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:24:20.3226670Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:24:20.3226998Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:24:20.3227328Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:24:20.3227696Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:24:20.3228055Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:24:20.3228383Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:24:20.3228740Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:24:20.3229119Z parm: rm_firmware_active:charp 2025-12-04T09:24:20.3229430Z + set +e 2025-12-04T09:24:20.3229619Z + nvidia-smi 2025-12-04T09:24:21.7741306Z Thu Dec 4 09:24:21 2025 2025-12-04T09:24:21.7741726Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:24:21.7742258Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:24:21.7742761Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:24:21.7743317Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:24:21.7743883Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:24:21.7744336Z | | | MIG M. | 2025-12-04T09:24:21.7744692Z |=========================================+========================+======================| 2025-12-04T09:24:21.7830489Z | 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:24:21.7830990Z | 0% 25C P0 59W / 300W | 0MiB / 23028MiB | 4% Default | 2025-12-04T09:24:21.7831484Z | | | N/A | 2025-12-04T09:24:21.7831893Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:24:21.7832621Z 2025-12-04T09:24:21.7832802Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:24:21.7833373Z | Processes: | 2025-12-04T09:24:21.7833848Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:24:21.7834280Z | ID ID Usage | 2025-12-04T09:24:21.7834951Z |=========================================================================================| 2025-12-04T09:24:21.7836529Z | No running processes found | 2025-12-04T09:24:21.7837151Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:24:22.2072746Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T09:24:23.6606009Z NVIDIA A10G 2025-12-04T09:24:23.9341735Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:24:23.9342037Z + '[' 0 -eq 0 ']' 2025-12-04T09:24:23.9342274Z + echo 'INFO: Ignoring allowed status 0' 2025-12-04T09:24:23.9342571Z + set -e 2025-12-04T09:24:23.9342777Z INFO: Ignoring allowed status 0 2025-12-04T09:24:23.9351362Z == Installing nvidia container toolkit for amzn2023 == 2025-12-04T09:24:23.9355343Z + sudo yum install -y yum-utils 2025-12-04T09:24:24.3819456Z Last metadata expiration check: 0:08:49 ago on Thu Dec 4 09:15:35 2025. 2025-12-04T09:24:24.4088182Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed. 2025-12-04T09:24:24.4671727Z Dependencies resolved. 2025-12-04T09:24:24.4965010Z Nothing to do. 2025-12-04T09:24:24.4965400Z Complete! 2025-12-04T09:24:24.5974731Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]] 2025-12-04T09:24:24.5975333Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:24:24.5976277Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:24:24.8822650Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:24:24.9367664Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8 2025-12-04T09:24:25.4654483Z nvidia-container-toolkit 18 kB/s | 833 B 00:00 2025-12-04T09:24:25.5493965Z Dependencies resolved. 2025-12-04T09:24:25.5781378Z ================================================================================ 2025-12-04T09:24:25.5781874Z Package Arch Version Repository Size 2025-12-04T09:24:25.5782301Z ================================================================================ 2025-12-04T09:24:25.5782635Z Downgrading: 2025-12-04T09:24:25.5783046Z libnvidia-container-tools x86_64 1.17.8-1 nvidia-container-toolkit 40 k 2025-12-04T09:24:25.5783662Z libnvidia-container1 x86_64 1.17.8-1 nvidia-container-toolkit 1.0 M 2025-12-04T09:24:25.5784268Z nvidia-container-toolkit x86_64 1.17.8-1 nvidia-container-toolkit 1.2 M 2025-12-04T09:24:25.5784902Z nvidia-container-toolkit-base x86_64 1.17.8-1 nvidia-container-toolkit 5.8 M 2025-12-04T09:24:25.5785289Z 2025-12-04T09:24:25.5785386Z Transaction Summary 2025-12-04T09:24:25.5785630Z ================================================================================ 2025-12-04T09:24:25.5785966Z Downgrade 4 Packages 2025-12-04T09:24:25.5786117Z 2025-12-04T09:24:25.5786227Z Total download size: 8.0 M 2025-12-04T09:24:25.5786889Z Downloading Packages: 2025-12-04T09:24:25.6226233Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 951 kB/s | 40 kB 00:00 2025-12-04T09:24:25.6667973Z (2/4): libnvidia-container1-1.17.8-1.x86_64.rpm 11 MB/s | 1.0 MB 00:00 2025-12-04T09:24:25.7178611Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 9.0 MB/s | 1.2 MB 00:00 2025-12-04T09:24:25.8462309Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x 26 MB/s | 5.8 MB 00:00 2025-12-04T09:24:25.8471942Z -------------------------------------------------------------------------------- 2025-12-04T09:24:25.8474866Z Total 30 MB/s | 8.0 MB 00:00 2025-12-04T09:24:25.8477753Z Running transaction check 2025-12-04T09:24:25.8597047Z Transaction check succeeded. 2025-12-04T09:24:25.8597622Z Running transaction test 2025-12-04T09:24:25.9101319Z Transaction test succeeded. 2025-12-04T09:24:25.9104582Z Running transaction 2025-12-04T09:24:26.7566295Z Preparing : 1/1 2025-12-04T09:24:26.8884432Z Downgrading : nvidia-container-toolkit-base-1.17.8-1.x86_64 1/8 2025-12-04T09:24:26.9142871Z Downgrading : libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:24:26.9952998Z Running scriptlet: libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:24:27.1282295Z Downgrading : libnvidia-container-tools-1.17.8-1.x86_64 3/8 2025-12-04T09:24:27.1589281Z Downgrading : nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:24:27.2154121Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:24:27.2229654Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:24:27.2230276Z Cleanup : nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:24:27.2599947Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:24:27.2664651Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:24:27.2665264Z Cleanup : libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:24:27.3046373Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:24:27.3128728Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:24:27.3129348Z Cleanup : libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:24:27.3532863Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:24:27.3610370Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:24:27.3611359Z Cleanup : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:24:27.4005285Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:24:27.4690526Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 8/8 2025-12-04T09:24:51.3335602Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:24:51.3338100Z Verifying : libnvidia-container-tools-1.17.8-1.x86_64 1/8 2025-12-04T09:24:51.3339302Z Verifying : libnvidia-container-tools-1.18.1-1.x86_64 2/8 2025-12-04T09:24:51.3340030Z Verifying : libnvidia-container1-1.17.8-1.x86_64 3/8 2025-12-04T09:24:51.3340673Z Verifying : libnvidia-container1-1.18.1-1.x86_64 4/8 2025-12-04T09:24:51.3342601Z Verifying : nvidia-container-toolkit-1.17.8-1.x86_64 5/8 2025-12-04T09:24:51.3343179Z Verifying : nvidia-container-toolkit-1.18.1-1.x86_64 6/8 2025-12-04T09:24:51.3343760Z Verifying : nvidia-container-toolkit-base-1.17.8-1.x86_64 7/8 2025-12-04T09:24:51.4893606Z Verifying : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8================================================================================ 2025-12-04T09:24:51.4894221Z WARNING: 2025-12-04T09:24:51.4894463Z A newer release of "Amazon Linux" is available. 2025-12-04T09:24:51.4894712Z 2025-12-04T09:24:51.4894802Z Available Versions: 2025-12-04T09:24:51.4894953Z 2025-12-04T09:24:51.4895067Z Version 2023.9.20250929: 2025-12-04T09:24:51.4895380Z Run the following command to upgrade to 2023.9.20250929: 2025-12-04T09:24:51.4895658Z 2025-12-04T09:24:51.4895782Z dnf upgrade --releasever=2023.9.20250929 2025-12-04T09:24:51.4896012Z 2025-12-04T09:24:51.4896096Z Release notes: 2025-12-04T09:24:51.4896531Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html 2025-12-04T09:24:51.4896936Z 2025-12-04T09:24:51.4897321Z Version 2023.9.20251014: 2025-12-04T09:24:51.4897646Z Run the following command to upgrade to 2023.9.20251014: 2025-12-04T09:24:51.4897923Z 2025-12-04T09:24:51.4898040Z dnf upgrade --releasever=2023.9.20251014 2025-12-04T09:24:51.4898263Z 2025-12-04T09:24:51.4898354Z Release notes: 2025-12-04T09:24:51.4898765Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html 2025-12-04T09:24:51.4899290Z 2025-12-04T09:24:51.4899378Z Version 2023.9.20251020: 2025-12-04T09:24:51.4899883Z Run the following command to upgrade to 2023.9.20251020: 2025-12-04T09:24:51.4900150Z 2025-12-04T09:24:51.4900263Z dnf upgrade --releasever=2023.9.20251020 2025-12-04T09:24:51.4900490Z 2025-12-04T09:24:51.4900573Z Release notes: 2025-12-04T09:24:51.4900984Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html 2025-12-04T09:24:51.4901377Z 2025-12-04T09:24:51.4901469Z Version 2023.9.20251027: 2025-12-04T09:24:51.4901782Z Run the following command to upgrade to 2023.9.20251027: 2025-12-04T09:24:51.4902057Z 2025-12-04T09:24:51.4902172Z dnf upgrade --releasever=2023.9.20251027 2025-12-04T09:24:51.4902391Z 2025-12-04T09:24:51.4902477Z Release notes: 2025-12-04T09:24:51.4902888Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html 2025-12-04T09:24:51.4903282Z 2025-12-04T09:24:51.4903369Z Version 2023.9.20251105: 2025-12-04T09:24:51.4903682Z Run the following command to upgrade to 2023.9.20251105: 2025-12-04T09:24:51.4903954Z 2025-12-04T09:24:51.4904073Z dnf upgrade --releasever=2023.9.20251105 2025-12-04T09:24:51.4904293Z 2025-12-04T09:24:51.4904376Z Release notes: 2025-12-04T09:24:51.4904783Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html 2025-12-04T09:24:51.4905185Z 2025-12-04T09:24:51.4905272Z Version 2023.9.20251110: 2025-12-04T09:24:51.4905589Z Run the following command to upgrade to 2023.9.20251110: 2025-12-04T09:24:51.4905862Z 2025-12-04T09:24:51.4905975Z dnf upgrade --releasever=2023.9.20251110 2025-12-04T09:24:51.4906204Z 2025-12-04T09:24:51.4906288Z Release notes: 2025-12-04T09:24:51.4906698Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html 2025-12-04T09:24:51.4907092Z 2025-12-04T09:24:51.4907189Z Version 2023.9.20251117: 2025-12-04T09:24:51.4907500Z Run the following command to upgrade to 2023.9.20251117: 2025-12-04T09:24:51.4908123Z 2025-12-04T09:24:51.4908285Z dnf upgrade --releasever=2023.9.20251117 2025-12-04T09:24:51.4908541Z 2025-12-04T09:24:51.4908642Z Release notes: 2025-12-04T09:24:51.4909054Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html 2025-12-04T09:24:51.4909458Z 2025-12-04T09:24:51.4909571Z ================================================================================ 2025-12-04T09:24:51.5485526Z 2025-12-04T09:24:51.5485675Z 2025-12-04T09:24:51.5486061Z Downgraded: 2025-12-04T09:24:51.5486599Z libnvidia-container-tools-1.17.8-1.x86_64 2025-12-04T09:24:51.5487417Z libnvidia-container1-1.17.8-1.x86_64 2025-12-04T09:24:51.5488241Z nvidia-container-toolkit-1.17.8-1.x86_64 2025-12-04T09:24:51.5489094Z nvidia-container-toolkit-base-1.17.8-1.x86_64 2025-12-04T09:24:51.5489579Z 2025-12-04T09:24:51.5489699Z Complete! 2025-12-04T09:24:51.6253014Z + sudo systemctl restart docker 2025-12-04T09:24:58.6056308Z Thu Dec 4 09:24:58 2025 2025-12-04T09:24:58.6056718Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:24:58.6057252Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:24:58.6057772Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:24:58.6058667Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:24:58.6059339Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:24:58.6059805Z | | | MIG M. | 2025-12-04T09:24:58.6060153Z |=========================================+========================+======================| 2025-12-04T09:24:58.6151737Z | 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:24:58.6152451Z | 0% 25C P0 55W / 300W | 0MiB / 23028MiB | 4% Default | 2025-12-04T09:24:58.6152853Z | | | N/A | 2025-12-04T09:24:58.6153373Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:24:58.6153782Z 2025-12-04T09:24:58.6154001Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:24:58.6154464Z | Processes: | 2025-12-04T09:24:58.6154933Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:24:58.6155363Z | ID ID Usage | 2025-12-04T09:24:58.6155730Z |=========================================================================================| 2025-12-04T09:24:58.6157493Z | No running processes found | 2025-12-04T09:24:58.6158026Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:24:58.7881908Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally 2025-12-04T09:24:58.9464443Z 3.13: Pulling from docker/library/python 2025-12-04T09:24:59.0274308Z 53c88f1dfeb7: Pulling fs layer 2025-12-04T09:24:59.0274739Z eae668646f44: Pulling fs layer 2025-12-04T09:24:59.0275051Z ff2e6e687b6c: Pulling fs layer 2025-12-04T09:24:59.0275326Z 7c40a3faff76: Pulling fs layer 2025-12-04T09:24:59.0275593Z 967a3b1c8fef: Pulling fs layer 2025-12-04T09:24:59.0275854Z a64e1a44f22a: Pulling fs layer 2025-12-04T09:24:59.0276117Z 52655f8a5bcc: Pulling fs layer 2025-12-04T09:24:59.0276386Z 967a3b1c8fef: Waiting 2025-12-04T09:24:59.0276607Z a64e1a44f22a: Waiting 2025-12-04T09:24:59.0276831Z 7c40a3faff76: Waiting 2025-12-04T09:24:59.0277048Z 52655f8a5bcc: Waiting 2025-12-04T09:24:59.1616775Z eae668646f44: Verifying Checksum 2025-12-04T09:24:59.1617179Z eae668646f44: Download complete 2025-12-04T09:24:59.2038458Z 53c88f1dfeb7: Verifying Checksum 2025-12-04T09:24:59.2038844Z 53c88f1dfeb7: Download complete 2025-12-04T09:24:59.2584533Z 967a3b1c8fef: Verifying Checksum 2025-12-04T09:24:59.2584879Z 967a3b1c8fef: Download complete 2025-12-04T09:24:59.2728254Z ff2e6e687b6c: Verifying Checksum 2025-12-04T09:24:59.2728626Z ff2e6e687b6c: Download complete 2025-12-04T09:24:59.3530703Z 52655f8a5bcc: Verifying Checksum 2025-12-04T09:24:59.3531044Z 52655f8a5bcc: Download complete 2025-12-04T09:24:59.3875869Z a64e1a44f22a: Verifying Checksum 2025-12-04T09:24:59.3876197Z a64e1a44f22a: Download complete 2025-12-04T09:24:59.9818936Z 7c40a3faff76: Verifying Checksum 2025-12-04T09:24:59.9819446Z 7c40a3faff76: Download complete 2025-12-04T09:25:01.0106959Z 53c88f1dfeb7: Pull complete 2025-12-04T09:25:01.7359545Z eae668646f44: Pull complete 2025-12-04T09:25:04.2578581Z ff2e6e687b6c: Pull complete 2025-12-04T09:25:10.9948548Z 7c40a3faff76: Pull complete 2025-12-04T09:25:11.2831292Z 967a3b1c8fef: Pull complete 2025-12-04T09:25:12.0497775Z a64e1a44f22a: Pull complete 2025-12-04T09:25:12.0722482Z 52655f8a5bcc: Pull complete 2025-12-04T09:25:12.0863690Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T09:25:12.0903246Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13 2025-12-04T09:25:19.1743311Z Thu Dec 4 09:25:19 2025 2025-12-04T09:25:19.1743728Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:25:19.1744261Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:25:19.1744782Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:25:19.1745309Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:25:19.1748006Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:25:19.1748466Z | | | MIG M. | 2025-12-04T09:25:19.1748819Z |=========================================+========================+======================| 2025-12-04T09:25:19.1894109Z | 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:25:19.1894600Z | 0% 22C P8 10W / 300W | 0MiB / 23028MiB | 0% Default | 2025-12-04T09:25:19.1895005Z | | | N/A | 2025-12-04T09:25:19.1895415Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:25:19.1898614Z 2025-12-04T09:25:19.1899420Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:25:19.1900019Z | Processes: | 2025-12-04T09:25:19.1900596Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:25:19.1901036Z | ID ID Usage | 2025-12-04T09:25:19.1901467Z |=========================================================================================| 2025-12-04T09:25:19.1905329Z | No running processes found | 2025-12-04T09:25:19.1905938Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:25:20.5885896Z Command completed after 1 attempt(s). 2025-12-04T09:25:20.5990580Z Prepare all required actions 2025-12-04T09:25:20.6021779Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-12-04T09:25:20.6022115Z with: 2025-12-04T09:25:20.6022835Z github-token: *** 2025-12-04T09:25:20.6023057Z env: 2025-12-04T09:25:20.6023269Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:20.6023531Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:20.6023825Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:20.6024178Z ##[endgroup] 2025-12-04T09:25:20.6039516Z ##[group]Run set -eux 2025-12-04T09:25:20.6039758Z set -eux 2025-12-04T09:25:20.6040193Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-12-04T09:25:20.6055483Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:20.6055855Z env: 2025-12-04T09:25:20.6056062Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:20.6056321Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:20.6056667Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:20.6057156Z GITHUB_TOKEN: *** 2025-12-04T09:25:20.6057377Z ##[endgroup] 2025-12-04T09:25:20.6100608Z + python3 .github/scripts/get_workflow_job_id.py 19922826259 i-0f694664a515f0ebd 2025-12-04T09:25:22.3437664Z Setting output job-id=57118183212 2025-12-04T09:25:22.3438583Z Setting output job-name=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:25:22.3553224Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:25:22.3553976Z python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:25:22.3554956Z python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 & 2025-12-04T09:25:22.3555835Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:25:22.3565624Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:22.3565992Z env: 2025-12-04T09:25:22.3566208Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:22.3566476Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:22.3566785Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:22.3567324Z JOB_ID: 57118183212 2025-12-04T09:25:22.3568038Z JOB_NAME: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:25:22.3568785Z WORKFLOW_NAME: periodic 2025-12-04T09:25:22.3569051Z WORKFLOW_RUN_ID: 19922826259 2025-12-04T09:25:22.3569328Z MONITOR_LOG_INTERVAL: 5 2025-12-04T09:25:22.3569597Z MONITOR_DATA_COLLECT_INTERVAL: 1 2025-12-04T09:25:22.3569888Z ##[endgroup] 2025-12-04T09:25:22.6485804Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:25:23.0626080Z Collecting psutil==5.9.8 2025-12-04T09:25:23.0781041Z Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) 2025-12-04T09:25:23.1588570Z Collecting dataclasses_json==0.6.7 2025-12-04T09:25:23.1619865Z Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) 2025-12-04T09:25:23.1917542Z Collecting nvidia-ml-py==11.525.84 2025-12-04T09:25:23.1952343Z Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB) 2025-12-04T09:25:23.3254254Z Collecting marshmallow<4.0.0,>=3.18.0 2025-12-04T09:25:23.3286249Z Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) 2025-12-04T09:25:23.3529199Z Collecting typing-inspect<1,>=0.4.0 2025-12-04T09:25:23.3560478Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-12-04T09:25:23.4143696Z Collecting packaging>=17.0 2025-12-04T09:25:23.4174379Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-12-04T09:25:23.4417285Z Collecting mypy-extensions>=0.3.0 2025-12-04T09:25:23.4447745Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-12-04T09:25:23.4968224Z Collecting typing-extensions>=3.7.4 2025-12-04T09:25:23.5000889Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-12-04T09:25:23.5928431Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json 2025-12-04T09:25:23.8734648Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0 2025-12-04T09:25:24.0712250Z Prepare all required actions 2025-12-04T09:25:24.0712609Z Getting action download info 2025-12-04T09:25:24.3664923Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:25:24.6197110Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-12-04T09:25:25.0137652Z ##[group]Run ./.github/actions/download-build-artifacts 2025-12-04T09:25:25.0138711Z with: 2025-12-04T09:25:25.0139219Z name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck 2025-12-04T09:25:25.0140044Z s3-bucket: gha-artifacts 2025-12-04T09:25:25.0140397Z env: 2025-12-04T09:25:25.0141231Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:25.0141745Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:25.0142127Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:25.0142890Z ##[endgroup] 2025-12-04T09:25:25.0180788Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:25:25.0181440Z with: 2025-12-04T09:25:25.0181767Z name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck 2025-12-04T09:25:25.0182286Z s3-bucket: gha-artifacts 2025-12-04T09:25:25.0182659Z region: us-east-1 2025-12-04T09:25:25.0182984Z env: 2025-12-04T09:25:25.0183282Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:25.0183641Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:25.0184061Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:25.0184500Z ##[endgroup] 2025-12-04T09:25:25.4968891Z (node:59427) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:25:25.4969416Z 2025-12-04T09:25:25.4969609Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:25:25.4970147Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:25:25.4970938Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:25:25.7788346Z Found 1 objects with prefix pytorch/pytorch/19922826259/linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck/ 2025-12-04T09:25:25.7789200Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:25:33.1319622Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:25:33.1324140Z Artifact download has finished successfully 2025-12-04T09:25:33.1701045Z ##[group]Run unzip -o artifacts.zip 2025-12-04T09:25:33.1701405Z unzip -o artifacts.zip 2025-12-04T09:25:33.1712075Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:33.1712447Z env: 2025-12-04T09:25:33.1712648Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:33.1712904Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:33.1713215Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:33.1713578Z ##[endgroup] 2025-12-04T09:25:33.1797288Z Archive: artifacts.zip 2025-12-04T09:25:33.1799094Z creating: dist/ 2025-12-04T09:25:35.2538762Z inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:25:35.2675507Z inflating: dist/.ninja_log 2025-12-04T09:25:35.2676534Z creating: build/custom_test_artifacts/ 2025-12-04T09:25:35.2677138Z creating: build/custom_test_artifacts/custom-op-build/ 2025-12-04T09:25:35.2677797Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-12-04T09:25:35.2678609Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:25:35.2687720Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:25:35.2688672Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/ 2025-12-04T09:25:35.2689564Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:25:35.2690569Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:25:35.2691867Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:25:35.2695018Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:25:35.2696719Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:25:35.2698113Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:25:35.2699219Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:25:35.2700223Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:25:35.2703559Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:25:35.2705199Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:25:35.2706887Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:25:35.2709694Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:25:35.2712564Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:25:35.2713660Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:25:35.2714708Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:25:35.2775214Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:25:35.2837429Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:25:35.2839241Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:25:35.2904892Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:25:35.2906326Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:25:35.2908067Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:25:35.2909217Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:25:35.2910711Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:25:35.2912203Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:25:35.2913652Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:25:35.2915074Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:25:35.2916483Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:25:35.2917812Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:25:35.2919139Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:25:35.2920107Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:25:35.2921533Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:25:35.2923331Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:25:35.2926370Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:25:35.3002410Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:25:35.3004022Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:25:35.3079117Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:25:35.3079911Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:25:35.3080949Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:25:35.3082407Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-12-04T09:25:35.3083655Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-12-04T09:25:35.3085047Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-12-04T09:25:35.3086622Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-12-04T09:25:35.3088126Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-12-04T09:25:35.3089515Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-12-04T09:25:35.3090958Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-12-04T09:25:35.3092145Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-12-04T09:25:35.3093216Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-12-04T09:25:35.3094008Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-12-04T09:25:35.3094804Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-12-04T09:25:35.3115386Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-12-04T09:25:35.3319923Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-12-04T09:25:35.3320661Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-12-04T09:25:35.3321458Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-12-04T09:25:35.3322790Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-12-04T09:25:35.3324035Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-12-04T09:25:35.3325159Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-12-04T09:25:35.3326318Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-12-04T09:25:35.3327472Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-12-04T09:25:35.3329125Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-12-04T09:25:35.3330283Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-12-04T09:25:35.3331656Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-12-04T09:25:35.3354574Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-12-04T09:25:35.3438631Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-12-04T09:25:35.3440099Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:25:35.3441202Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:25:35.3442197Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-12-04T09:25:35.3443126Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-12-04T09:25:35.3445685Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-12-04T09:25:35.3446599Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2025-12-04T09:25:35.3450051Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-12-04T09:25:35.3451113Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-12-04T09:25:35.3453566Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-12-04T09:25:35.3630276Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-12-04T09:25:35.3688333Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-12-04T09:25:35.3689039Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-12-04T09:25:35.3689678Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-12-04T09:25:35.3690511Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:25:35.3698642Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:25:35.3699642Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/ 2025-12-04T09:25:35.3700517Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:25:35.3701720Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:25:35.3702699Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:25:35.3705368Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:25:35.3707125Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:25:35.3709098Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:25:35.3710071Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:25:35.3711057Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:25:35.3714330Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:25:35.3715960Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:25:35.3717671Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:25:35.3720413Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:25:35.3722635Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:25:35.3723709Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:25:35.3724706Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:25:35.3785773Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:25:35.3847643Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:25:35.3849147Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:25:35.3915434Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:25:35.3916865Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:25:35.3918291Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:25:35.3919758Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:25:35.3921145Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:25:35.3922527Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:25:35.3923947Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:25:35.3925369Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:25:35.3926713Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:25:35.3927983Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:25:35.3929229Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:25:35.3930461Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:25:35.3931757Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:25:35.3932941Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:25:35.3936300Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:25:35.4012145Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:25:35.4012941Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:25:35.4089640Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:25:35.4091188Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:25:35.4092218Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:25:35.4092849Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-12-04T09:25:35.4093533Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-12-04T09:25:35.4094305Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-12-04T09:25:35.4095192Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-12-04T09:25:35.4096038Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-12-04T09:25:35.4096825Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-12-04T09:25:35.4097639Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-12-04T09:25:35.4099040Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-12-04T09:25:35.4100281Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-12-04T09:25:35.4101432Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-12-04T09:25:35.4102987Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-12-04T09:25:35.4125429Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-12-04T09:25:35.4191424Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-12-04T09:25:35.4192615Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:25:35.4193404Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:25:35.4194110Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-12-04T09:25:35.4195232Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-12-04T09:25:35.4197639Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-12-04T09:25:35.4198291Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2025-12-04T09:25:35.4201443Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-12-04T09:25:35.4202562Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-12-04T09:25:35.4203826Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-12-04T09:25:35.4244589Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-12-04T09:25:35.4245313Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-12-04T09:25:35.4246040Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-12-04T09:25:35.4246935Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:25:35.4255093Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:25:35.4256082Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/ 2025-12-04T09:25:35.4257050Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:25:35.4258067Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:25:35.4259231Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:25:35.4262015Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:25:35.4263907Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:25:35.4265320Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:25:35.4266392Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:25:35.4267474Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:25:35.4270724Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:25:35.4272311Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:25:35.4274045Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:25:35.4276728Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:25:35.4278902Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:25:35.4280066Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:25:35.4281158Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:25:35.4342832Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:25:35.4404162Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:25:35.4405795Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:25:35.4471527Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:25:35.4473079Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:25:35.4474614Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:25:35.4476176Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:25:35.4477690Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:25:35.4479176Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:25:35.4480668Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:25:35.4482185Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:25:35.4483630Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:25:35.4485273Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:25:35.4486604Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:25:35.4487912Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:25:35.4489218Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:25:35.4490573Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:25:35.4492976Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:25:35.4569378Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:25:35.4570596Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:25:35.4646755Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:25:35.4647913Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:25:35.4648801Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:25:35.4649755Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-12-04T09:25:35.4650759Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-12-04T09:25:35.4651908Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-12-04T09:25:35.4653248Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-12-04T09:25:35.4654509Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-12-04T09:25:35.4655883Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-12-04T09:25:35.4657102Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-12-04T09:25:35.4658304Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-12-04T09:25:35.4659640Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-12-04T09:25:35.4660877Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-12-04T09:25:35.4662113Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-12-04T09:25:35.4666167Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-12-04T09:25:35.4790306Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-12-04T09:25:35.4791514Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-12-04T09:25:35.4792948Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-12-04T09:25:35.4794427Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-12-04T09:25:35.4795840Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-12-04T09:25:35.4797448Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-12-04T09:25:35.4798868Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-12-04T09:25:35.4800534Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-12-04T09:25:35.4801995Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-12-04T09:25:35.4803481Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-12-04T09:25:35.4804904Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-12-04T09:25:35.4825179Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-12-04T09:25:35.4882050Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-12-04T09:25:35.4883571Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:25:35.4884914Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:25:35.4886093Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-12-04T09:25:35.4887273Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-12-04T09:25:35.4889368Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-12-04T09:25:35.4890469Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2025-12-04T09:25:35.4894070Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-12-04T09:25:35.4895257Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-12-04T09:25:35.4896585Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-12-04T09:25:35.5001300Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-12-04T09:25:35.5042385Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-12-04T09:25:35.5043280Z creating: build/lib/ 2025-12-04T09:25:35.5127342Z inflating: build/lib/libprotobuf-lite.a 2025-12-04T09:25:35.5574299Z inflating: build/lib/libprotobuf.a 2025-12-04T09:25:35.6073620Z inflating: build/lib/libprotoc.a 2025-12-04T09:25:35.6082814Z inflating: build/lib/libpthreadpool.a 2025-12-04T09:25:35.6091618Z inflating: build/lib/libcpuinfo.a 2025-12-04T09:25:35.6099968Z inflating: build/lib/libcpuinfo_internals.a 2025-12-04T09:25:35.6100970Z inflating: build/lib/libclog.a 2025-12-04T09:25:35.6121375Z inflating: build/lib/libpytorch_qnnpack.a 2025-12-04T09:25:35.6124520Z inflating: build/lib/libnnpack_reference_layers.a 2025-12-04T09:25:35.6142820Z inflating: build/lib/libnnpack.a 2025-12-04T09:25:35.6331190Z inflating: build/lib/libmicrokernels-prod.a 2025-12-04T09:25:35.7216087Z inflating: build/lib/libmicrokernels-all.a 2025-12-04T09:25:35.7287397Z inflating: build/lib/libgtest.a 2025-12-04T09:25:35.7304539Z inflating: build/lib/libgmock.a 2025-12-04T09:25:35.7306008Z inflating: build/lib/libgtest_main.a 2025-12-04T09:25:35.7307159Z inflating: build/lib/libgmock_main.a 2025-12-04T09:25:35.7400253Z inflating: build/lib/libXNNPACK.a 2025-12-04T09:25:35.7477426Z inflating: build/lib/libbenchmark.a 2025-12-04T09:25:35.7478086Z inflating: build/lib/libbenchmark_main.a 2025-12-04T09:25:35.7479702Z inflating: build/lib/libjitprofiling.a 2025-12-04T09:25:35.7547619Z inflating: build/lib/libasmjit.a 2025-12-04T09:25:35.7556291Z inflating: build/lib/libittnotify.a 2025-12-04T09:25:35.8760370Z inflating: build/lib/libfbgemm.a 2025-12-04T09:25:35.8791609Z inflating: build/lib/libtensorpipe_uv.a 2025-12-04T09:25:35.9346920Z inflating: build/lib/libtensorpipe.a 2025-12-04T09:25:35.9595319Z inflating: build/lib/libtensorpipe_cuda.a 2025-12-04T09:25:35.9732404Z inflating: build/lib/libgloo.a 2025-12-04T09:25:35.9780593Z inflating: build/lib/libonnx_proto.a 2025-12-04T09:25:36.0230394Z inflating: build/lib/libgloo_cuda.a 2025-12-04T09:25:36.0953289Z inflating: build/lib/libonnx.a 2025-12-04T09:25:37.1288838Z inflating: build/lib/libdnnl.a 2025-12-04T09:25:37.1309066Z inflating: build/lib/libfmt.a 2025-12-04T09:25:37.1791595Z inflating: build/lib/libkineto.a 2025-12-04T09:25:37.1909136Z inflating: build/lib/libc10.so 2025-12-04T09:25:37.1959278Z inflating: build/lib/libc10_cuda.so 2025-12-04T09:25:37.1961519Z inflating: build/lib/libcaffe2_nvrtc.so 2025-12-04T09:25:37.1963286Z inflating: build/lib/libtorch_global_deps.so 2025-12-04T09:25:40.3326316Z inflating: build/lib/libtorch_cpu.so 2025-12-04T09:25:40.4116899Z inflating: build/lib/libtorch_nvshmem.so 2025-12-04T09:25:42.3928158Z inflating: build/lib/libtorch_cuda.so 2025-12-04T09:25:42.3929440Z inflating: build/lib/libtorch.so 2025-12-04T09:25:42.3981901Z inflating: build/lib/libtorch_cuda_linalg.so 2025-12-04T09:25:42.4054152Z inflating: build/lib/libtorchbind_test.so 2025-12-04T09:25:42.4073375Z inflating: build/lib/libjitbackend_test.so 2025-12-04T09:25:42.4097478Z inflating: build/lib/libbackend_with_compiler.so 2025-12-04T09:25:42.4124333Z inflating: build/lib/libaoti_custom_ops.so 2025-12-04T09:25:42.4127253Z inflating: build/lib/libc10d_cuda_test.so 2025-12-04T09:25:42.4131834Z inflating: build/lib/libshm.so 2025-12-04T09:25:42.6539722Z inflating: build/lib/libtorch_python.so 2025-12-04T09:25:42.6576519Z inflating: build/lib/libnnapi_backend.so 2025-12-04T09:25:42.6576847Z creating: build/bin/ 2025-12-04T09:25:42.7037471Z inflating: build/bin/protoc-3.13.0.0 2025-12-04T09:25:42.7496384Z inflating: build/bin/protoc 2025-12-04T09:25:42.7556821Z inflating: build/bin/c10_AllocatorConfig_test 2025-12-04T09:25:42.7612948Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-12-04T09:25:42.7670262Z inflating: build/bin/c10_DeviceGuard_test 2025-12-04T09:25:42.7728571Z inflating: build/bin/c10_Device_test 2025-12-04T09:25:42.7795210Z inflating: build/bin/c10_DispatchKeySet_test 2025-12-04T09:25:42.7849881Z inflating: build/bin/c10_StreamGuard_test 2025-12-04T09:25:42.7910159Z inflating: build/bin/c10_Scalar_test 2025-12-04T09:25:42.7972619Z inflating: build/bin/c10_SizesAndStrides_test 2025-12-04T09:25:42.8033655Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-12-04T09:25:42.8096447Z inflating: build/bin/c10_SymInt_test 2025-12-04T09:25:42.8158625Z inflating: build/bin/c10_InlineStreamGuard_test 2025-12-04T09:25:42.8214347Z inflating: build/bin/c10_ArrayRef_test 2025-12-04T09:25:42.8291285Z inflating: build/bin/c10_cow_test 2025-12-04T09:25:42.8347132Z inflating: build/bin/c10_ConstexprCrc_test 2025-12-04T09:25:42.8403023Z inflating: build/bin/c10_DeadlockDetection_test 2025-12-04T09:25:42.8462276Z inflating: build/bin/c10_Bitset_test 2025-12-04T09:25:42.8526093Z inflating: build/bin/c10_Enumerate_test 2025-12-04T09:25:42.8584678Z inflating: build/bin/c10_IntrusiveList_test 2025-12-04T09:25:42.8641343Z inflating: build/bin/c10_Half_test 2025-12-04T09:25:42.8703613Z inflating: build/bin/c10_LeftRight_test 2025-12-04T09:25:42.8763766Z inflating: build/bin/c10_NetworkFlow_test 2025-12-04T09:25:42.8818878Z inflating: build/bin/c10_Semaphore_test 2025-12-04T09:25:42.8875115Z inflating: build/bin/c10_Synchronized_test 2025-12-04T09:25:42.8937698Z inflating: build/bin/c10_ThreadLocal_test 2025-12-04T09:25:42.8995529Z inflating: build/bin/c10_TypeIndex_test 2025-12-04T09:25:42.9053479Z inflating: build/bin/c10_accumulate_test 2025-12-04T09:25:42.9116136Z inflating: build/bin/c10_bfloat16_test 2025-12-04T09:25:42.9172780Z inflating: build/bin/c10_bit_cast_test 2025-12-04T09:25:42.9234423Z inflating: build/bin/c10_complex_test 2025-12-04T09:25:42.9297903Z inflating: build/bin/c10_complex_math_test 2025-12-04T09:25:42.9353251Z inflating: build/bin/c10_error_test 2025-12-04T09:25:42.9411929Z inflating: build/bin/c10_exception_test 2025-12-04T09:25:42.9468183Z inflating: build/bin/c10_flags_test 2025-12-04T09:25:42.9525008Z inflating: build/bin/c10_generic_math_test 2025-12-04T09:25:42.9581745Z inflating: build/bin/c10_irange_test 2025-12-04T09:25:42.9642086Z inflating: build/bin/c10_lazy_test 2025-12-04T09:25:42.9812411Z inflating: build/bin/c10_intrusive_ptr_test 2025-12-04T09:25:42.9875571Z inflating: build/bin/c10_logging_test 2025-12-04T09:25:42.9931317Z inflating: build/bin/c10_nofatal_test 2025-12-04T09:25:43.0014001Z inflating: build/bin/c10_optional_test 2025-12-04T09:25:43.0073325Z inflating: build/bin/c10_registry_test 2025-12-04T09:25:43.0141949Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-12-04T09:25:43.0308090Z inflating: build/bin/c10_small_vector_test 2025-12-04T09:25:43.0371866Z inflating: build/bin/c10_string_util_test 2025-12-04T09:25:43.0430083Z inflating: build/bin/c10_ssize_test 2025-12-04T09:25:43.0486315Z inflating: build/bin/c10_tempfile_test 2025-12-04T09:25:43.0541545Z inflating: build/bin/c10_string_view_test 2025-12-04T09:25:43.0604350Z inflating: build/bin/c10_typeid_test 2025-12-04T09:25:43.0652980Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-12-04T09:25:43.0712522Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2025-12-04T09:25:43.0771928Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2025-12-04T09:25:43.0830916Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2025-12-04T09:25:43.0890264Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2025-12-04T09:25:43.0946537Z inflating: build/bin/c10_cuda_CUDATest 2025-12-04T09:25:43.1006222Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2025-12-04T09:25:43.1064984Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2025-12-04T09:25:43.1125007Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2025-12-04T09:25:43.1747343Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-12-04T09:25:43.2383868Z inflating: build/bin/vec_test_all_types_AVX512 2025-12-04T09:25:43.3029600Z inflating: build/bin/vec_test_all_types_AVX2 2025-12-04T09:25:43.3136100Z inflating: build/bin/test_aoti_abi_check 2025-12-04T09:25:43.3191584Z inflating: build/bin/test_vec_half_DEFAULT 2025-12-04T09:25:43.3247877Z inflating: build/bin/test_vec_half_AVX512 2025-12-04T09:25:43.3303485Z inflating: build/bin/test_vec_half_AVX2 2025-12-04T09:25:43.3384308Z inflating: build/bin/Dict_test 2025-12-04T09:25:43.3443471Z inflating: build/bin/Dimname_test 2025-12-04T09:25:43.3515730Z inflating: build/bin/MaybeOwned_test 2025-12-04T09:25:43.3579259Z inflating: build/bin/NamedTensor_test 2025-12-04T09:25:43.3644777Z inflating: build/bin/apply_utils_test 2025-12-04T09:25:43.3710085Z inflating: build/bin/atest 2025-12-04T09:25:43.3780584Z inflating: build/bin/basic 2025-12-04T09:25:43.3841277Z inflating: build/bin/broadcast_test 2025-12-04T09:25:43.3899168Z inflating: build/bin/cpu_allocator_test 2025-12-04T09:25:43.3963573Z inflating: build/bin/cpu_generator_test 2025-12-04T09:25:43.4022782Z inflating: build/bin/cpu_profiling_allocator_test 2025-12-04T09:25:43.4123674Z inflating: build/bin/cpu_rng_test 2025-12-04T09:25:43.4180889Z inflating: build/bin/dlconvertor_test 2025-12-04T09:25:43.4245634Z inflating: build/bin/extension_backend_test 2025-12-04T09:25:43.4306798Z inflating: build/bin/half_test 2025-12-04T09:25:43.4412717Z inflating: build/bin/ivalue_test 2025-12-04T09:25:43.4468571Z inflating: build/bin/lazy_tensor_test 2025-12-04T09:25:43.4527606Z inflating: build/bin/math_kernel_test 2025-12-04T09:25:43.4586742Z inflating: build/bin/memory_format_test 2025-12-04T09:25:43.4646634Z inflating: build/bin/memory_overlapping_test 2025-12-04T09:25:43.4706282Z inflating: build/bin/mobile_memory_cleanup 2025-12-04T09:25:43.4769162Z inflating: build/bin/native_test 2025-12-04T09:25:43.4825817Z inflating: build/bin/operator_name_test 2025-12-04T09:25:43.4882434Z inflating: build/bin/operators_test 2025-12-04T09:25:43.4941224Z inflating: build/bin/packedtensoraccessor_test 2025-12-04T09:25:43.5015795Z inflating: build/bin/pow_test 2025-12-04T09:25:43.5078327Z inflating: build/bin/quantized_test 2025-12-04T09:25:43.5134694Z inflating: build/bin/reduce_ops_test 2025-12-04T09:25:43.5191514Z inflating: build/bin/reportMemoryUsage_test 2025-12-04T09:25:43.5253854Z inflating: build/bin/scalar_tensor_test 2025-12-04T09:25:43.5317853Z inflating: build/bin/scalar_test 2025-12-04T09:25:43.5376005Z inflating: build/bin/StorageUtils_test 2025-12-04T09:25:43.5434056Z inflating: build/bin/stride_properties_test 2025-12-04T09:25:43.5520827Z inflating: build/bin/tensor_iterator_test 2025-12-04T09:25:43.5581152Z inflating: build/bin/test_parallel 2025-12-04T09:25:43.5638078Z inflating: build/bin/thread_init_test 2025-12-04T09:25:43.5699318Z inflating: build/bin/type_ptr_test 2025-12-04T09:25:43.5765141Z inflating: build/bin/type_test 2025-12-04T09:25:43.5823878Z inflating: build/bin/undefined_tensor_test 2025-12-04T09:25:43.5879646Z inflating: build/bin/verify_api_visibility 2025-12-04T09:25:43.5958536Z inflating: build/bin/legacy_vmap_test 2025-12-04T09:25:43.6015188Z inflating: build/bin/weakref_test 2025-12-04T09:25:43.6074078Z inflating: build/bin/wrapdim_test 2025-12-04T09:25:43.6131679Z inflating: build/bin/xla_tensor_test 2025-12-04T09:25:43.6197719Z inflating: build/bin/IListRef_test 2025-12-04T09:25:43.6312685Z inflating: build/bin/List_test 2025-12-04T09:25:43.6385603Z inflating: build/bin/KernelFunction_test 2025-12-04T09:25:43.6516301Z inflating: build/bin/kernel_function_legacy_test 2025-12-04T09:25:43.6619681Z inflating: build/bin/kernel_function_test 2025-12-04T09:25:43.6756230Z inflating: build/bin/kernel_lambda_legacy_test 2025-12-04T09:25:43.6867268Z inflating: build/bin/kernel_lambda_test 2025-12-04T09:25:43.6933515Z inflating: build/bin/kernel_stackbased_test 2025-12-04T09:25:43.7037583Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-12-04T09:25:43.7094175Z inflating: build/bin/CppSignature_test 2025-12-04T09:25:43.7155956Z inflating: build/bin/backend_fallback_test 2025-12-04T09:25:43.7211261Z inflating: build/bin/op_allowlist_test 2025-12-04T09:25:43.7542280Z inflating: build/bin/op_registration_test 2025-12-04T09:25:43.7615621Z inflating: build/bin/inline_container_test 2025-12-04T09:25:43.7675134Z inflating: build/bin/cuda_allocator_test 2025-12-04T09:25:43.7734803Z inflating: build/bin/cuda_apply_test 2025-12-04T09:25:43.7801211Z inflating: build/bin/cuda_atomic_ops_test 2025-12-04T09:25:43.7864499Z inflating: build/bin/cuda_caching_host_allocator_test 2025-12-04T09:25:43.7942278Z inflating: build/bin/cuda_complex_math_test 2025-12-04T09:25:43.8008054Z inflating: build/bin/cuda_complex_test 2025-12-04T09:25:43.8078027Z inflating: build/bin/cuda_cub_test 2025-12-04T09:25:43.8136643Z inflating: build/bin/cuda_cublas_handle_pool_test 2025-12-04T09:25:43.8192423Z inflating: build/bin/cuda_device_test 2025-12-04T09:25:43.8264101Z inflating: build/bin/cuda_distributions_test 2025-12-04T09:25:43.8323333Z inflating: build/bin/cuda_event_test 2025-12-04T09:25:43.8381933Z inflating: build/bin/cuda_dlconvertor_test 2025-12-04T09:25:43.8437001Z inflating: build/bin/cuda_exchange_device_test 2025-12-04T09:25:43.8495954Z inflating: build/bin/cuda_reportMemoryUsage_test 2025-12-04T09:25:43.8551977Z inflating: build/bin/cuda_allocatorTraceTracker_test 2025-12-04T09:25:43.8609954Z inflating: build/bin/cuda_integer_divider_test 2025-12-04T09:25:43.8677600Z inflating: build/bin/cuda_stream_test 2025-12-04T09:25:43.8733268Z inflating: build/bin/cuda_cudnn_test 2025-12-04T09:25:43.8789436Z inflating: build/bin/cuda_half_test 2025-12-04T09:25:43.8852336Z inflating: build/bin/cuda_generator_test 2025-12-04T09:25:43.8908145Z inflating: build/bin/cuda_optional_test 2025-12-04T09:25:43.8966165Z inflating: build/bin/cuda_packedtensoraccessor_test 2025-12-04T09:25:43.9025343Z inflating: build/bin/cuda_vectorized_test 2025-12-04T09:25:44.0168661Z inflating: build/bin/test_jit 2025-12-04T09:25:44.0227906Z inflating: build/bin/BackoffTest 2025-12-04T09:25:44.0287766Z inflating: build/bin/FileStoreTest 2025-12-04T09:25:44.0657096Z inflating: build/bin/test_lazy 2025-12-04T09:25:44.0719942Z inflating: build/bin/TCPStoreTest 2025-12-04T09:25:44.0779988Z inflating: build/bin/HashStoreTest 2025-12-04T09:25:44.0794186Z inflating: build/bin/ProcessGroupMPITest 2025-12-04T09:25:44.0797374Z inflating: build/bin/example_allreduce 2025-12-04T09:25:44.0871378Z inflating: build/bin/ProcessGroupGlooTest 2025-12-04T09:25:44.0934749Z inflating: build/bin/ProcessGroupGlooAsyncTest 2025-12-04T09:25:44.1005826Z inflating: build/bin/ProcessGroupNCCLTest 2025-12-04T09:25:44.1073652Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2025-12-04T09:25:44.1135802Z inflating: build/bin/test_dist_autograd 2025-12-04T09:25:44.1211614Z inflating: build/bin/test_cpp_rpc 2025-12-04T09:25:44.1214209Z inflating: build/bin/parallel_benchmark 2025-12-04T09:25:44.2437209Z inflating: build/bin/test_api 2025-12-04T09:25:44.2441364Z inflating: build/bin/torch_shm_manager 2025-12-04T09:25:44.2441692Z creating: .additional_ci_files/ 2025-12-04T09:25:44.2507433Z inflating: .additional_ci_files/test-times.json 2025-12-04T09:25:44.2745549Z inflating: .additional_ci_files/test-class-times.json 2025-12-04T09:25:44.2795452Z ##[group]Run rm artifacts.zip 2025-12-04T09:25:44.2795769Z rm artifacts.zip 2025-12-04T09:25:44.2805227Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:44.2805601Z env: 2025-12-04T09:25:44.2805995Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:44.2806259Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:44.2806568Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:44.2806912Z ##[endgroup] 2025-12-04T09:25:44.4282542Z ##[group]Run df -H 2025-12-04T09:25:44.4282779Z df -H 2025-12-04T09:25:44.4291859Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:44.4292436Z env: 2025-12-04T09:25:44.4292635Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:44.4292888Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:44.4293199Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:44.4293547Z ##[endgroup] 2025-12-04T09:25:44.4348782Z Filesystem Size Used Avail Use% Mounted on 2025-12-04T09:25:44.4349238Z devtmpfs 4.2M 0 4.2M 0% /dev 2025-12-04T09:25:44.4349580Z tmpfs 34G 0 34G 0% /dev/shm 2025-12-04T09:25:44.4349902Z tmpfs 14G 562k 14G 1% /run 2025-12-04T09:25:44.4350211Z /dev/nvme0n1p1 161G 54G 108G 34% / 2025-12-04T09:25:44.4350535Z tmpfs 34G 17k 34G 1% /tmp 2025-12-04T09:25:44.4350865Z /dev/nvme0n1p128 11M 1.4M 9.2M 13% /boot/efi 2025-12-04T09:25:44.4351209Z tmpfs 6.7G 0 6.7G 0% /run/user/0 2025-12-04T09:25:44.4387850Z Prepare all required actions 2025-12-04T09:25:44.4388883Z Getting action download info 2025-12-04T09:25:44.5914504Z ##[group]Run ./.github/actions/download-td-artifacts 2025-12-04T09:25:44.5914847Z with: 2025-12-04T09:25:44.5915033Z env: 2025-12-04T09:25:44.5915228Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:44.5915488Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:44.5915796Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:44.5916142Z ##[endgroup] 2025-12-04T09:25:44.6377533Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:25:44.6378021Z with: 2025-12-04T09:25:44.6378205Z name: td_results 2025-12-04T09:25:44.6378436Z s3-bucket: gha-artifacts 2025-12-04T09:25:44.6378684Z region: us-east-1 2025-12-04T09:25:44.6378882Z env: 2025-12-04T09:25:44.6379175Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:44.6379427Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:44.6379726Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:44.6380255Z ##[endgroup] 2025-12-04T09:25:45.2579012Z (node:59451) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:25:45.2579999Z 2025-12-04T09:25:45.2580344Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:25:45.2581299Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:25:45.2582319Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:25:45.3620565Z Found 1 objects with prefix pytorch/pytorch/19922826259/td_results/ 2025-12-04T09:25:45.3621223Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:25:45.4207533Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:25:45.4213269Z Artifact download has finished successfully 2025-12-04T09:25:45.4556196Z ##[group]Run mkdir -p .additional_ci_files 2025-12-04T09:25:45.4556578Z mkdir -p .additional_ci_files 2025-12-04T09:25:45.4557013Z mv td_results.json .additional_ci_files/td_results.json || true 2025-12-04T09:25:45.4566999Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:45.4567360Z env: 2025-12-04T09:25:45.4567566Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:45.4567826Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:45.4568127Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:45.4568473Z ##[endgroup] 2025-12-04T09:25:45.4678414Z ##[group]Run .github/scripts/parse_ref.py 2025-12-04T09:25:45.4678799Z .github/scripts/parse_ref.py 2025-12-04T09:25:45.4687708Z shell: /usr/bin/bash -e {0} 2025-12-04T09:25:45.4687968Z env: 2025-12-04T09:25:45.4688171Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:45.4688429Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:45.4688727Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:45.4689080Z ##[endgroup] 2025-12-04T09:25:45.4929699Z Setting output branch=main 2025-12-04T09:25:45.5071513Z Prepare all required actions 2025-12-04T09:25:45.5071864Z Getting action download info 2025-12-04T09:25:45.6271156Z ##[group]Run ./.github/actions/filter-test-configs 2025-12-04T09:25:45.6271492Z with: 2025-12-04T09:25:45.6271874Z github-token: *** 2025-12-04T09:25:45.6281175Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:25:45.6291288Z job-name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:25:45.6292020Z env: 2025-12-04T09:25:45.6292226Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:45.6292483Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:45.6292787Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:45.6293139Z ##[endgroup] 2025-12-04T09:25:45.6328954Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:25:45.6329244Z with: 2025-12-04T09:25:45.6329451Z shell: bash 2025-12-04T09:25:45.6329665Z timeout_minutes: 10 2025-12-04T09:25:45.6329901Z max_attempts: 5 2025-12-04T09:25:45.6330122Z retry_wait_seconds: 30 2025-12-04T09:25:45.6330914Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:25:45.6331935Z polling_interval_seconds: 1 2025-12-04T09:25:45.6332209Z warning_on_retry: true 2025-12-04T09:25:45.6332465Z continue_on_error: false 2025-12-04T09:25:45.6332705Z env: 2025-12-04T09:25:45.6332900Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:45.6333157Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:45.6333456Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:45.6333939Z GITHUB_TOKEN: *** 2025-12-04T09:25:45.6334154Z ##[endgroup] 2025-12-04T09:25:45.7358016Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:25:45.9721811Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:25:46.0952086Z Collecting requests==2.27.1 2025-12-04T09:25:46.1119879Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-12-04T09:25:46.3103990Z Collecting pyyaml==6.0.2 2025-12-04T09:25:46.3163394Z Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB) 2025-12-04T09:25:46.3922158Z Collecting certifi>=2017.4.17 2025-12-04T09:25:46.3959198Z Downloading certifi-2025.11.12-py3-none-any.whl (159 kB) 2025-12-04T09:25:46.4029172Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10) 2025-12-04T09:25:46.4032766Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10) 2025-12-04T09:25:46.8450818Z Collecting charset-normalizer~=2.0.0 2025-12-04T09:25:46.8489709Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-12-04T09:25:46.9367829Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml 2025-12-04T09:25:47.0597635Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1 2025-12-04T09:25:47.7122331Z Command completed after 1 attempt(s). 2025-12-04T09:25:47.7190272Z ##[group]Run set -x 2025-12-04T09:25:47.7202134Z set -x 2025-12-04T09:25:47.7202389Z  2025-12-04T09:25:47.7202775Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:25:47.7203250Z # in runner workspace 2025-12-04T09:25:47.7203641Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-12-04T09:25:47.7213227Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:47.7213594Z env: 2025-12-04T09:25:47.7213791Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:47.7214045Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:47.7214347Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:47.7214681Z ##[endgroup] 2025-12-04T09:25:47.7245916Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-12-04T09:25:47.7429525Z Setting output branch=main 2025-12-04T09:25:47.7485716Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:25:47.7486160Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:25:47.7486494Z echo "Job name: ${JOB_NAME}" 2025-12-04T09:25:47.7486801Z  2025-12-04T09:25:47.7487162Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:25:47.7487639Z # in runner workspace 2025-12-04T09:25:47.7488062Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-12-04T09:25:47.7488533Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-12-04T09:25:47.7488864Z  --job-name "${JOB_NAME}" \ 2025-12-04T09:25:47.7498449Z  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]}" \ 2025-12-04T09:25:47.7508816Z  --selected-test-configs "" \ 2025-12-04T09:25:47.7509252Z  --pr-number "${PR_NUMBER}" \ 2025-12-04T09:25:47.7509631Z  --tag "${TAG}" \ 2025-12-04T09:25:47.7509958Z  --event-name "${EVENT_NAME}" \ 2025-12-04T09:25:47.7510286Z  --schedule "${SCHEDULE}" \ 2025-12-04T09:25:47.7510611Z  --branch "${HEAD_BRANCH}" 2025-12-04T09:25:47.7519707Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:47.7520079Z env: 2025-12-04T09:25:47.7520299Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:47.7520559Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:47.7520870Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:47.7521556Z GITHUB_TOKEN: *** 2025-12-04T09:25:47.7522248Z JOB_NAME: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:25:47.7522985Z PR_NUMBER: 2025-12-04T09:25:47.7523206Z TAG: 2025-12-04T09:25:47.7523401Z EVENT_NAME: schedule 2025-12-04T09:25:47.7523638Z SCHEDULE: 29 8 * * * 2025-12-04T09:25:47.7523864Z HEAD_BRANCH: main 2025-12-04T09:25:47.7524088Z ##[endgroup] 2025-12-04T09:25:47.7553295Z Workflow: periodic 2025-12-04T09:25:47.7554006Z Job name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:25:47.9492466Z Setting output keep-going=True 2025-12-04T09:25:47.9492810Z Setting output ci-verbose-test-logs=False 2025-12-04T09:25:47.9493159Z Setting output ci-test-showlocals=False 2025-12-04T09:25:47.9493481Z Setting output ci-no-test-timeout=False 2025-12-04T09:25:47.9493796Z Setting output ci-no-td=False 2025-12-04T09:25:47.9494356Z Setting output ci-td-distributed=False 2025-12-04T09:25:47.9494682Z Setting output is-unstable=False 2025-12-04T09:25:47.9494968Z Setting output reenabled-issues= 2025-12-04T09:25:47.9516386Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:25:47.9537782Z Setting output is-test-matrix-empty=False 2025-12-04T09:25:47.9618447Z ##[group]Run echo "Filtered matrix:" 2025-12-04T09:25:47.9618809Z echo "Filtered matrix:" 2025-12-04T09:25:47.9639844Z echo "{"include": [{"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 7, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 8, "num_shards": 8, "runner": "linux.g5.4xlarge.nvidia.gpu", "owners": ["module:slowgradcheck"], "rerun_disabled_tests": "rerun_disabled_tests"}]}" 2025-12-04T09:25:47.9660953Z  2025-12-04T09:25:47.9661148Z echo 2025-12-04T09:25:47.9661416Z echo "Is the current job unstable? False" 2025-12-04T09:25:47.9661735Z  2025-12-04T09:25:47.9661934Z echo 2025-12-04T09:25:47.9662182Z echo "Is keep-going label set? True" 2025-12-04T09:25:47.9662485Z  2025-12-04T09:25:47.9662683Z echo 2025-12-04T09:25:47.9662908Z echo "Reenabled issues? " 2025-12-04T09:25:47.9672085Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:47.9672454Z env: 2025-12-04T09:25:47.9672664Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:47.9672915Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:47.9673225Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:47.9673590Z ##[endgroup] 2025-12-04T09:25:47.9704064Z Filtered matrix: 2025-12-04T09:25:47.9729599Z {include: [{config: default, shard: 1, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 6, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 6, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 6, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 6, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 7, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 7, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 7, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 7, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 8, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check}, {config: default, shard: 8, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 8, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 8, num_shards: 8, runner: linux.g5.4xlarge.nvidia.gpu, owners: [module:slowgradcheck], rerun_disabled_tests: rerun_disabled_tests}]} 2025-12-04T09:25:47.9750411Z 2025-12-04T09:25:47.9750528Z Is the current job unstable? False 2025-12-04T09:25:47.9750742Z 2025-12-04T09:25:47.9750848Z Is keep-going label set? True 2025-12-04T09:25:47.9751030Z 2025-12-04T09:25:47.9751122Z Reenabled issues? 2025-12-04T09:25:47.9782628Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:25:47.9783150Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:25:47.9791533Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:47.9791894Z env: 2025-12-04T09:25:47.9792109Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:47.9792366Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:47.9792665Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:47.9793023Z JOB_TIMEOUT: 600 2025-12-04T09:25:47.9793248Z ##[endgroup] 2025-12-04T09:25:47.9847371Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:25:47.9847900Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:25:47.9848379Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:25:47.9856552Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:25:47.9856912Z env: 2025-12-04T09:25:47.9857123Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:47.9857391Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:47.9857696Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:47.9858056Z ##[endgroup] 2025-12-04T09:25:47.9979212Z ##[group]Run set -x 2025-12-04T09:25:47.9979519Z set -x 2025-12-04T09:25:47.9979736Z  2025-12-04T09:25:47.9979985Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-12-04T09:25:47.9980366Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-12-04T09:25:47.9980758Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-12-04T09:25:47.9981113Z  TEST_COMMAND=.ci/onnx/test.sh 2025-12-04T09:25:47.9981422Z else 2025-12-04T09:25:47.9981656Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:25:47.9981957Z fi 2025-12-04T09:25:47.9982145Z  2025-12-04T09:25:47.9982392Z # Leaving 1GB for the runner and other things 2025-12-04T09:25:47.9982979Z TOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo) 2025-12-04T09:25:47.9983856Z # https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap 2025-12-04T09:25:47.9984567Z # comes from https://github.com/pytorch/test-infra/pull/6058 2025-12-04T09:25:47.9985106Z TOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3)) 2025-12-04T09:25:47.9985525Z  2025-12-04T09:25:47.9985781Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:25:47.9986117Z  SHM_OPTS= 2025-12-04T09:25:47.9986536Z  JENKINS_USER= 2025-12-04T09:25:47.9986889Z  # ensure that docker container cleanly exits in 12 hours 2025-12-04T09:25:47.9987364Z  # if for some reason cleanup action doesn't stop container 2025-12-04T09:25:47.9987765Z  # when job is cancelled 2025-12-04T09:25:47.9988078Z  DOCKER_SHELL_CMD="sleep 12h" 2025-12-04T09:25:47.9988406Z  USED_IMAGE="${DOCKER_IMAGE_S390X}" 2025-12-04T09:25:47.9988721Z else 2025-12-04T09:25:47.9988966Z  SHM_OPTS="--shm-size=${SHM_SIZE}" 2025-12-04T09:25:47.9989308Z  JENKINS_USER="--user jenkins" 2025-12-04T09:25:47.9989615Z  DOCKER_SHELL_CMD= 2025-12-04T09:25:47.9989910Z  USED_IMAGE="${DOCKER_IMAGE}" 2025-12-04T09:25:47.9990204Z fi 2025-12-04T09:25:47.9990397Z  2025-12-04T09:25:47.9990741Z # detached container should get cleaned up by teardown_ec2_linux 2025-12-04T09:25:47.9991291Z # TODO: Stop building test binaries as part of the build phase 2025-12-04T09:25:47.9991912Z # Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice 2025-12-04T09:25:47.9992440Z # shellcheck disable=SC2086,SC2090 2025-12-04T09:25:47.9992781Z container_name=$(docker run \ 2025-12-04T09:25:47.9993087Z  ${GPU_FLAG:-} \ 2025-12-04T09:25:47.9993376Z  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \ 2025-12-04T09:25:47.9993715Z  -e BUILD_ENVIRONMENT \ 2025-12-04T09:25:47.9994011Z  -e PR_NUMBER \ 2025-12-04T09:25:47.9994274Z  -e GITHUB_ACTIONS \ 2025-12-04T09:25:47.9994552Z  -e GITHUB_REPOSITORY \ 2025-12-04T09:25:47.9994846Z  -e GITHUB_WORKFLOW \ 2025-12-04T09:25:47.9995125Z  -e GITHUB_JOB \ 2025-12-04T09:25:47.9995382Z  -e GITHUB_RUN_ID \ 2025-12-04T09:25:47.9995684Z  -e GITHUB_RUN_NUMBER \ 2025-12-04T09:25:47.9996000Z  -e GITHUB_RUN_ATTEMPT \ 2025-12-04T09:25:47.9996279Z  -e JOB_ID \ 2025-12-04T09:25:47.9996528Z  -e JOB_NAME \ 2025-12-04T09:25:47.9996774Z  -e BASE_SHA \ 2025-12-04T09:25:47.9997014Z  -e BRANCH \ 2025-12-04T09:25:47.9997257Z  -e SHA1 \ 2025-12-04T09:25:47.9997496Z  -e AWS_DEFAULT_REGION \ 2025-12-04T09:25:47.9997776Z  -e IN_WHEEL_TEST \ 2025-12-04T09:25:47.9998041Z  -e SHARD_NUMBER \ 2025-12-04T09:25:47.9998307Z  -e TEST_CONFIG \ 2025-12-04T09:25:47.9998583Z  -e NUM_TEST_SHARDS \ 2025-12-04T09:25:47.9998988Z  -e REENABLED_ISSUES \ 2025-12-04T09:25:47.9999282Z  -e CONTINUE_THROUGH_ERROR \ 2025-12-04T09:25:47.9999594Z  -e VERBOSE_TEST_LOGS \ 2025-12-04T09:25:47.9999878Z  -e TEST_SHOWLOCALS \ 2025-12-04T09:25:48.0000158Z  -e NO_TEST_TIMEOUT \ 2025-12-04T09:25:48.0000430Z  -e NO_TD \ 2025-12-04T09:25:48.0000670Z  -e TD_DISTRIBUTED \ 2025-12-04T09:25:48.0000951Z  -e PR_LABELS \ 2025-12-04T09:25:48.0001236Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-12-04T09:25:48.0001558Z  -e SCCACHE_BUCKET \ 2025-12-04T09:25:48.0001845Z  -e SCCACHE_REGION \ 2025-12-04T09:25:48.0002129Z  -e XLA_CUDA \ 2025-12-04T09:25:48.0002424Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2025-12-04T09:25:48.0002781Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-12-04T09:25:48.0003155Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-12-04T09:25:48.0003534Z  -e SKIP_SCCACHE_INITIALIZATION=1 \ 2025-12-04T09:25:48.0003873Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-12-04T09:25:48.0004207Z  -e VLLM_TEST_HUGGING_FACE_TOKEN \ 2025-12-04T09:25:48.0004555Z  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \ 2025-12-04T09:25:48.0004887Z  -e DASHBOARD_TAG \ 2025-12-04T09:25:48.0005168Z  -e ARTIFACTS_FILE_SUFFIX \ 2025-12-04T09:25:48.0005537Z  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \ 2025-12-04T09:25:48.0006066Z  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \ 2025-12-04T09:25:48.0006475Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T09:25:48.0006879Z  --security-opt seccomp=unconfined \ 2025-12-04T09:25:48.0007221Z  --cap-add=SYS_PTRACE \ 2025-12-04T09:25:48.0007508Z  --ipc=host \ 2025-12-04T09:25:48.0007973Z  ${SHM_OPTS} \ 2025-12-04T09:25:48.0008305Z  --tty \ 2025-12-04T09:25:48.0008571Z  --detach \ 2025-12-04T09:25:48.0008829Z  --name="${container_name}" \ 2025-12-04T09:25:48.0009130Z  ${JENKINS_USER} \ 2025-12-04T09:25:48.0009465Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-12-04T09:25:48.0009854Z  -w /var/lib/jenkins/workspace \ 2025-12-04T09:25:48.0010183Z  "${USED_IMAGE}" \ 2025-12-04T09:25:48.0010467Z  ${DOCKER_SHELL_CMD} 2025-12-04T09:25:48.0010726Z ) 2025-12-04T09:25:48.0011059Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2025-12-04T09:25:48.0011484Z  2025-12-04T09:25:48.0011749Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:25:48.0012332Z  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt" 2025-12-04T09:25:48.0012868Z fi 2025-12-04T09:25:48.0013090Z  2025-12-04T09:25:48.0013611Z docker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2025-12-04T09:25:48.0022639Z shell: /usr/bin/bash -e {0} 2025-12-04T09:25:48.0022907Z env: 2025-12-04T09:25:48.0023125Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:25:48.0023397Z HAS_NVIDIA_GPU: true 2025-12-04T09:25:48.0023698Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:25:48.0024187Z BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck 2025-12-04T09:25:48.0024608Z PR_NUMBER: 2025-12-04T09:25:48.0024842Z GITHUB_REPOSITORY: pytorch/pytorch 2025-12-04T09:25:48.0025150Z GITHUB_WORKFLOW: periodic 2025-12-04T09:25:48.0025416Z GITHUB_JOB: test 2025-12-04T09:25:48.0025645Z GITHUB_RUN_ID: 19922826259 2025-12-04T09:25:48.0025915Z GITHUB_RUN_NUMBER: 19107 2025-12-04T09:25:48.0026174Z GITHUB_RUN_ATTEMPT: 1 2025-12-04T09:25:48.0026407Z JOB_ID: 57118183212 2025-12-04T09:25:48.0027084Z JOB_NAME: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:25:48.0027988Z BRANCH: main 2025-12-04T09:25:48.0028247Z SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:25:48.0028615Z BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:25:48.0028958Z TEST_CONFIG: default 2025-12-04T09:25:48.0029190Z SHARD_NUMBER: 2 2025-12-04T09:25:48.0029400Z NUM_TEST_SHARDS: 8 2025-12-04T09:25:48.0029625Z EXTRA_FLAGS: 2025-12-04T09:25:48.0029848Z OP_BENCHMARK_TESTS: 2025-12-04T09:25:48.0030077Z REENABLED_ISSUES: 2025-12-04T09:25:48.0030321Z CONTINUE_THROUGH_ERROR: True 2025-12-04T09:25:48.0030596Z VERBOSE_TEST_LOGS: False 2025-12-04T09:25:48.0030855Z TEST_SHOWLOCALS: False 2025-12-04T09:25:48.0031101Z NO_TEST_TIMEOUT: False 2025-12-04T09:25:48.0031358Z NO_TD: False 2025-12-04T09:25:48.0031581Z TD_DISTRIBUTED: False 2025-12-04T09:25:48.0031882Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-12-04T09:25:48.0032256Z SCCACHE_REGION: us-east-1 2025-12-04T09:25:48.0032522Z SHM_SIZE: 2g 2025-12-04T09:25:48.0033319Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:25:48.0034788Z DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:25:48.0035682Z XLA_CUDA: 2025-12-04T09:25:48.0036169Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:25:48.0036628Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1 2025-12-04T09:25:48.0036953Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-12-04T09:25:48.0037265Z DASHBOARD_TAG: 2025-12-04T09:25:48.0037694Z VLLM_TEST_HUGGING_FACE_TOKEN: *** 2025-12-04T09:25:48.0038111Z HUGGING_FACE_HUB_TOKEN: *** 2025-12-04T09:25:48.0038528Z SCRIBE_GRAPHQL_ACCESS_TOKEN: *** 2025-12-04T09:25:48.0039007Z ARTIFACTS_FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212 2025-12-04T09:25:48.0039494Z ##[endgroup] 2025-12-04T09:25:48.0068226Z + [[ default == \m\u\l\t\i\g\p\u ]] 2025-12-04T09:25:48.0068661Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *onnx* ]] 2025-12-04T09:25:48.0069077Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:25:48.0072557Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo 2025-12-04T09:25:48.0098761Z + TOTAL_AVAILABLE_MEMORY_IN_GB='61.094 ' 2025-12-04T09:25:48.0099145Z + TOTAL_MEMORY_WITH_SWAP=64 2025-12-04T09:25:48.0099533Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *\s\3\9\0\x* ]] 2025-12-04T09:25:48.0099962Z + SHM_OPTS=--shm-size=2g 2025-12-04T09:25:48.0100234Z + JENKINS_USER='--user jenkins' 2025-12-04T09:25:48.0100487Z + DOCKER_SHELL_CMD= 2025-12-04T09:25:48.0101284Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:25:48.0109290Z +++ nproc --ignore=2 2025-12-04T09:25:48.0139545Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=61g --memory-swap=64g --env-file=/tmp/github_env_19922826259 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:25:56.8346219Z + container_name=5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T09:25:56.8347466Z + echo DOCKER_CONTAINER_ID=5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T09:25:56.8348301Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *\s\3\9\0\x* ]] 2025-12-04T09:25:56.8353175Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:25:56.8356768Z + docker exec -t 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh' 2025-12-04T09:25:57.3173283Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f) 2025-12-04T09:25:57.6412642Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0) 2025-12-04T09:25:57.6416309Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2) 2025-12-04T09:25:57.6421929Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3) 2025-12-04T09:25:57.6426377Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8) 2025-12-04T09:25:57.6429835Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6) 2025-12-04T09:25:57.6434231Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0) 2025-12-04T09:25:57.6446915Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0) 2025-12-04T09:25:57.6832438Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4) 2025-12-04T09:25:57.6851440Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0) 2025-12-04T09:25:57.6907981Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3) 2025-12-04T09:25:58.0659287Z Installing collected packages: torch 2025-12-04T09:26:09.5078058Z Successfully installed torch-2.10.0a0+gitffd9b0f 2025-12-04T09:26:09.5795537Z + export TERM=vt100 2025-12-04T09:26:09.5795780Z + TERM=vt100 2025-12-04T09:26:09.5799222Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:26:09.5811321Z + source .ci/pytorch/common.sh 2025-12-04T09:26:09.5815337Z +++ dirname .ci/pytorch/common.sh 2025-12-04T09:26:09.5824853Z ++ source .ci/pytorch/common_utils.sh 2025-12-04T09:26:09.5826214Z +++ declare -f -t trap_add 2025-12-04T09:26:09.5832261Z ++ set -ex -o pipefail 2025-12-04T09:26:09.5832689Z ++ [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *rocm* ]] 2025-12-04T09:26:09.5833230Z ++ BUILD_TEST_LIBTORCH=0 2025-12-04T09:26:09.5836341Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:26:09.6122587Z + source .ci/pytorch/common-build.sh 2025-12-04T09:26:09.6124508Z ++ [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *win-* ]] 2025-12-04T09:26:09.6131044Z ++++ dirname .ci/pytorch/common-build.sh 2025-12-04T09:26:09.6141474Z +++ cd .ci/pytorch 2025-12-04T09:26:09.6141819Z +++ pwd -P 2025-12-04T09:26:09.6213073Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch 2025-12-04T09:26:09.6213564Z ++ [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *-pch* ]] 2025-12-04T09:26:09.6213963Z ++ which sccache 2025-12-04T09:26:09.6280732Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]] 2025-12-04T09:26:09.6281102Z ++ sccache --stop-server 2025-12-04T09:26:09.6311704Z ++ true 2025-12-04T09:26:09.6311966Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-12-04T09:26:09.6322912Z ++ trap_add sccache_epilogue EXIT 2025-12-04T09:26:09.6323218Z ++ trap_add_cmd=sccache_epilogue 2025-12-04T09:26:09.6323590Z ++ shift 2025-12-04T09:26:09.6323795Z ++ for trap_add_name in "$@" 2025-12-04T09:26:09.6330698Z ++++ trap -p EXIT 2025-12-04T09:26:09.6334229Z +++ eval 'extract_trap_cmd ' 2025-12-04T09:26:09.6334633Z ++++ extract_trap_cmd 2025-12-04T09:26:09.6334944Z ++++ printf '%s\n' '' 2025-12-04T09:26:09.6335276Z +++ printf '%s\n' sccache_epilogue 2025-12-04T09:26:09.6336826Z ++ trap -- ' 2025-12-04T09:26:09.6337140Z sccache_epilogue' EXIT 2025-12-04T09:26:09.6337443Z ++ [[ -n 1 ]] 2025-12-04T09:26:09.6337934Z ++ echo 'Skipping sccache server initialization, setting environment variables' 2025-12-04T09:26:09.6338575Z Skipping sccache server initialization, setting environment variables 2025-12-04T09:26:09.6339408Z ++ export SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:26:09.6339688Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:26:09.6340029Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:26:09.6340469Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:26:09.6345292Z ++ export RUST_LOG=sccache::server=error 2025-12-04T09:26:09.6345626Z ++ RUST_LOG=sccache::server=error 2025-12-04T09:26:09.6345912Z ++ sccache --zero-stats 2025-12-04T09:26:10.0159711Z Statistics zeroed. 2025-12-04T09:26:10.0168685Z ++ which ccache 2025-12-04T09:26:10.0249864Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *rocm* ]] 2025-12-04T09:26:10.0250418Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *s390x* ]] 2025-12-04T09:26:10.0250848Z + [[ -d /var/lib/jenkins/workspace ]] 2025-12-04T09:26:10.0252747Z ++ stat -c %u /var/lib/jenkins/workspace 2025-12-04T09:26:10.0269832Z + WORKSPACE_ORIGINAL_OWNER_ID=1000 2025-12-04T09:26:10.0270182Z + trap_add cleanup_workspace EXIT 2025-12-04T09:26:10.0270489Z + trap_add_cmd=cleanup_workspace 2025-12-04T09:26:10.0270742Z + shift 2025-12-04T09:26:10.0270941Z + for trap_add_name in "$@" 2025-12-04T09:26:10.0276976Z +++ trap -p EXIT 2025-12-04T09:26:10.0288312Z ++ eval 'extract_trap_cmd trap -- '\'' 2025-12-04T09:26:10.0288711Z sccache_epilogue'\'' EXIT' 2025-12-04T09:26:10.0288983Z +++ extract_trap_cmd trap -- ' 2025-12-04T09:26:10.0289258Z sccache_epilogue' EXIT 2025-12-04T09:26:10.0289497Z +++ printf '%s\n' ' 2025-12-04T09:26:10.0289726Z sccache_epilogue' 2025-12-04T09:26:10.0289969Z ++ printf '%s\n' cleanup_workspace 2025-12-04T09:26:10.0290261Z + trap -- ' 2025-12-04T09:26:10.0290459Z sccache_epilogue 2025-12-04T09:26:10.0290688Z cleanup_workspace' EXIT 2025-12-04T09:26:10.0290984Z + sudo chown -R jenkins /var/lib/jenkins/workspace 2025-12-04T09:26:11.0776364Z + git config --global --add safe.directory /var/lib/jenkins/workspace 2025-12-04T09:26:11.0798849Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *cuda* ]] 2025-12-04T09:26:11.0801844Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))' 2025-12-04T09:26:11.5242450Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:26:11.5243063Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']' 2025-12-04T09:26:11.5248265Z +++ realpath .ci/pytorch/test.sh 2025-12-04T09:26:11.5260980Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh 2025-12-04T09:26:11.5270353Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch 2025-12-04T09:26:11.5271431Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:26:11.5272028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace 2025-12-04T09:26:11.5272486Z + patch -p4 2025-12-04T09:26:11.5286650Z patching file cudadrv/driver.py 2025-12-04T09:26:11.5287845Z Hunk #1 succeeded at 357 (offset -8 lines). 2025-12-04T09:26:11.5359827Z + popd 2025-12-04T09:26:11.5360036Z ~/workspace 2025-12-04T09:26:11.5360268Z + echo 'Environment variables:' 2025-12-04T09:26:11.5360548Z Environment variables: 2025-12-04T09:26:11.5360782Z + env 2025-12-04T09:26:11.5370843Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:26:11.5371469Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:26:11.5372015Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck 2025-12-04T09:26:11.5372841Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:26:11.5373232Z HOSTNAME=5d0babf71ea3 2025-12-04T09:26:11.5373928Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.5374778Z GITHUB_ACTION=__run_3 2025-12-04T09:26:11.5375111Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:26:11.5375494Z GITHUB_RUN_NUMBER=19107 2025-12-04T09:26:11.5375760Z TEST_CONFIG=default 2025-12-04T09:26:11.5376001Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:26:11.5376313Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:26:11.5376618Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:26:11.5377254Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:26:11.5377537Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:26:11.5377805Z GITHUB_REF_TYPE=branch 2025-12-04T09:26:11.5378089Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:26:11.5378407Z XLA_CUDA= 2025-12-04T09:26:11.5378631Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:26:11.5379023Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:26:11.5379688Z *** 2025-12-04T09:26:11.5379888Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:26:11.5380169Z GITHUB_ACTIONS=true 2025-12-04T09:26:11.5380417Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:26:11.5380742Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:26:11.5381128Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:26:11.5381499Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:26:11.5382038Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main 2025-12-04T09:26:11.5382563Z UCC_HOME=/usr 2025-12-04T09:26:11.5382775Z VERBOSE_TEST_LOGS=False 2025-12-04T09:26:11.5383017Z GITHUB_REF=refs/heads/main 2025-12-04T09:26:11.5383265Z SHARD_NUMBER=2 2025-12-04T09:26:11.5383480Z GITHUB_REF_PROTECTED=true 2025-12-04T09:26:11.5383726Z HOME=/var/lib/jenkins 2025-12-04T09:26:11.5384012Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:26:11.5384328Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:26:11.5384657Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:26:11.5384985Z USE_SYSTEM_NCCL=1 2025-12-04T09:26:11.5385198Z NUM_TEST_SHARDS=8 2025-12-04T09:26:11.5385404Z UCX_HOME=/usr 2025-12-04T09:26:11.5385971Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.5387063Z JOB_NAME=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:26:11.5388127Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.5388949Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:26:11.5389452Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:26:11.5389701Z DASHBOARD_TAG= 2025-12-04T09:26:11.5389910Z GITHUB_RUN_ID=19922826259 2025-12-04T09:26:11.5390154Z INSTALLED_OPENBLAS= 2025-12-04T09:26:11.5390771Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.5391506Z GITHUB_ACTOR=huydhn 2025-12-04T09:26:11.5391884Z PR_NUMBER= 2025-12-04T09:26:11.5392083Z DESIRED_CUDA=12.8.1 2025-12-04T09:26:11.5392303Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:26:11.5392536Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:26:11.5392854Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:26:11.5393186Z TERM=vt100 2025-12-04T09:26:11.5393377Z INSTALLED_VISION=yes 2025-12-04T09:26:11.5393598Z BRANCH=main 2025-12-04T09:26:11.5393809Z SCCACHE_REGION=us-east-1 2025-12-04T09:26:11.5394057Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:26:11.5394330Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:26:11.5394577Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:26:11.5395086Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:26:11.5395667Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:26:11.5396008Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:26:11.5396328Z REENABLED_ISSUES= 2025-12-04T09:26:11.5396532Z DOCS= 2025-12-04T09:26:11.5396713Z SHLVL=1 2025-12-04T09:26:11.5396894Z MAX_JOBS=14 2025-12-04T09:26:11.5397093Z GITHUB_ACTOR_ID=475357 2025-12-04T09:26:11.5397420Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:26:11.5397795Z GITHUB_REF_NAME=main 2025-12-04T09:26:11.5398153Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:26:11.5398576Z GITHUB_JOB=test 2025-12-04T09:26:11.5398791Z NO_TEST_TIMEOUT=False 2025-12-04T09:26:11.5399110Z TD_DISTRIBUTED=False 2025-12-04T09:26:11.5399360Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:26:11.5399654Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:26:11.5399895Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:26:11.5400155Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:26:11.5400939Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:26:11.5401744Z GITHUB_BASE_REF= 2025-12-04T09:26:11.5401958Z INSTALLED_ACL= 2025-12-04T09:26:11.5402358Z ARTIFACTS_FILE_SUFFIX=test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212 2025-12-04T09:26:11.5402824Z CI=true 2025-12-04T09:26:11.5403023Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:26:11.5403324Z RUST_LOG=sccache::server=error 2025-12-04T09:26:11.5403585Z JOB_ID=57118183212 2025-12-04T09:26:11.5403796Z GITHUB_HEAD_REF= 2025-12-04T09:26:11.5404012Z GITHUB_ACTION_REF= 2025-12-04T09:26:11.5404284Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:26:11.5404621Z TEST_SHOWLOCALS=False 2025-12-04T09:26:11.5404861Z GITHUB_WORKFLOW=periodic 2025-12-04T09:26:11.5405123Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:26:11.5405748Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.5406392Z NO_TD=False 2025-12-04T09:26:11.5406613Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:26:11.5406898Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:26:11.5407204Z _=/usr/bin/env 2025-12-04T09:26:11.5407554Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:26:11.5408437Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-12-04T09:26:11.5517737Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-12-04T09:26:11.5518543Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-12-04T09:26:11.5519343Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-12-04T09:26:11.5520047Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-12-04T09:26:11.5520515Z + BUILD_DIR=build 2025-12-04T09:26:11.5520781Z + BUILD_RENAMED_DIR=build_renamed 2025-12-04T09:26:11.5521173Z + BUILD_BIN_DIR=build/bin 2025-12-04T09:26:11.5521431Z + SHARD_NUMBER=2 2025-12-04T09:26:11.5521688Z + NUM_TEST_SHARDS=8 2025-12-04T09:26:11.5521932Z + export TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:26:11.5522264Z + TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:26:11.5522541Z + export VALGRIND=ON 2025-12-04T09:26:11.5522964Z + VALGRIND=ON 2025-12-04T09:26:11.5523301Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *clang9* ]] 2025-12-04T09:26:11.5523806Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *xpu* ]] 2025-12-04T09:26:11.5524194Z + detect_cuda_arch 2025-12-04T09:26:11.5524519Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *cuda* ]] 2025-12-04T09:26:11.5524910Z + command -v nvidia-smi 2025-12-04T09:26:11.5525150Z /usr/bin/nvidia-smi 2025-12-04T09:26:11.5528932Z ++ nvidia-smi --query-gpu=compute_cap --format=csv 2025-12-04T09:26:11.5529389Z ++ tail -n 1 2025-12-04T09:26:11.5809844Z + TORCH_CUDA_ARCH_LIST=8.6 2025-12-04T09:26:11.5810275Z + export TORCH_CUDA_ARCH_LIST 2025-12-04T09:26:11.5810676Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *s390x* ]] 2025-12-04T09:26:11.5811072Z + [[ 0 == \1 ]] 2025-12-04T09:26:11.5811276Z + [[ True == \1 ]] 2025-12-04T09:26:11.5811596Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *bazel* ]] 2025-12-04T09:26:11.5814808Z ++ realpath build/custom_test_artifacts 2025-12-04T09:26:11.6217458Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2025-12-04T09:26:11.6217971Z + [[ -n '' ]] 2025-12-04T09:26:11.6218193Z + echo 'Environment variables' 2025-12-04T09:26:11.6218462Z Environment variables 2025-12-04T09:26:11.6218689Z + env 2025-12-04T09:26:11.6372262Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:26:11.6373432Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:26:11.6374955Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck 2025-12-04T09:26:11.6376228Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:26:11.6377140Z HOSTNAME=5d0babf71ea3 2025-12-04T09:26:11.6377907Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.6378698Z GITHUB_ACTION=__run_3 2025-12-04T09:26:11.6378952Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:26:11.6379317Z GITHUB_RUN_NUMBER=19107 2025-12-04T09:26:11.6379738Z TEST_CONFIG=default 2025-12-04T09:26:11.6379980Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:26:11.6380287Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:26:11.6380590Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:26:11.6380996Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:26:11.6381277Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:26:11.6381535Z GITHUB_REF_TYPE=branch 2025-12-04T09:26:11.6381777Z TORCH_CUDA_ARCH_LIST=8.6 2025-12-04T09:26:11.6382063Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:26:11.6382459Z XLA_CUDA= 2025-12-04T09:26:11.6382762Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:26:11.6383509Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:26:11.6383916Z *** 2025-12-04T09:26:11.6384156Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:26:11.6384504Z GITHUB_ACTIONS=true 2025-12-04T09:26:11.6384808Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:26:11.6385241Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:26:11.6385767Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:26:11.6386268Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:26:11.6386881Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main 2025-12-04T09:26:11.6387461Z UCC_HOME=/usr 2025-12-04T09:26:11.6387687Z TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:26:11.6387940Z VERBOSE_TEST_LOGS=False 2025-12-04T09:26:11.6388180Z GITHUB_REF=refs/heads/main 2025-12-04T09:26:11.6388429Z SHARD_NUMBER=2 2025-12-04T09:26:11.6388637Z GITHUB_REF_PROTECTED=true 2025-12-04T09:26:11.6388888Z HOME=/var/lib/jenkins 2025-12-04T09:26:11.6389157Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:26:11.6389468Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:26:11.6389836Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:26:11.6390159Z USE_SYSTEM_NCCL=1 2025-12-04T09:26:11.6390375Z NUM_TEST_SHARDS=8 2025-12-04T09:26:11.6390582Z UCX_HOME=/usr 2025-12-04T09:26:11.6391361Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.6392516Z JOB_NAME=linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T09:26:11.6393584Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.6394394Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:26:11.6394899Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:26:11.6395145Z DASHBOARD_TAG= 2025-12-04T09:26:11.6395357Z GITHUB_RUN_ID=19922826259 2025-12-04T09:26:11.6395594Z INSTALLED_OPENBLAS= 2025-12-04T09:26:11.6396205Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.6396886Z GITHUB_ACTOR=huydhn 2025-12-04T09:26:11.6397098Z PR_NUMBER= 2025-12-04T09:26:11.6397295Z DESIRED_CUDA=12.8.1 2025-12-04T09:26:11.6397519Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:26:11.6397730Z VALGRIND=ON 2025-12-04T09:26:11.6397942Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:26:11.6398269Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:26:11.6398595Z TERM=vt100 2025-12-04T09:26:11.6398798Z INSTALLED_VISION=yes 2025-12-04T09:26:11.6399012Z BRANCH=main 2025-12-04T09:26:11.6399212Z SCCACHE_REGION=us-east-1 2025-12-04T09:26:11.6399570Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:26:11.6399827Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:26:11.6400074Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:26:11.6400584Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:26:11.6401169Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:26:11.6401507Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:26:11.6401835Z REENABLED_ISSUES= 2025-12-04T09:26:11.6402036Z DOCS= 2025-12-04T09:26:11.6402211Z SHLVL=1 2025-12-04T09:26:11.6402388Z MAX_JOBS=14 2025-12-04T09:26:11.6402590Z GITHUB_ACTOR_ID=475357 2025-12-04T09:26:11.6402909Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:26:11.6403281Z GITHUB_REF_NAME=main 2025-12-04T09:26:11.6403643Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:26:11.6404051Z GITHUB_JOB=test 2025-12-04T09:26:11.6404269Z NO_TEST_TIMEOUT=False 2025-12-04T09:26:11.6404499Z TD_DISTRIBUTED=False 2025-12-04T09:26:11.6404750Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:26:11.6405035Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:26:11.6405283Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:26:11.6405536Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:26:11.6406311Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:26:11.6407120Z GITHUB_BASE_REF= 2025-12-04T09:26:11.6407333Z INSTALLED_ACL= 2025-12-04T09:26:11.6408011Z ARTIFACTS_FILE_SUFFIX=test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212 2025-12-04T09:26:11.6408520Z CI=true 2025-12-04T09:26:11.6408724Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:26:11.6409027Z RUST_LOG=sccache::server=error 2025-12-04T09:26:11.6409285Z JOB_ID=57118183212 2025-12-04T09:26:11.6409495Z GITHUB_HEAD_REF= 2025-12-04T09:26:11.6409695Z GITHUB_ACTION_REF= 2025-12-04T09:26:11.6409962Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:26:11.6410298Z TEST_SHOWLOCALS=False 2025-12-04T09:26:11.6410536Z GITHUB_WORKFLOW=periodic 2025-12-04T09:26:11.6410796Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:26:11.6411419Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_198bc67c-0846-46e5-96ef-ef7f70bb4eea 2025-12-04T09:26:11.6412051Z NO_TD=False 2025-12-04T09:26:11.6412252Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:26:11.6412541Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:26:11.6412985Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:26:11.6413395Z _=/usr/bin/env 2025-12-04T09:26:11.6413754Z + echo 'Testing pytorch' 2025-12-04T09:26:11.6414001Z Testing pytorch 2025-12-04T09:26:11.6414206Z + export LANG=C.UTF-8 2025-12-04T09:26:11.6414431Z + LANG=C.UTF-8 2025-12-04T09:26:11.6414630Z + PR_NUMBER= 2025-12-04T09:26:11.6414835Z + [[ default == \d\e\f\a\u\l\t ]] 2025-12-04T09:26:11.6415115Z + export CUDA_VISIBLE_DEVICES=0 2025-12-04T09:26:11.6415380Z + CUDA_VISIBLE_DEVICES=0 2025-12-04T09:26:11.6415630Z + export HIP_VISIBLE_DEVICES=0 2025-12-04T09:26:11.6415893Z + HIP_VISIBLE_DEVICES=0 2025-12-04T09:26:11.6416139Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-12-04T09:26:11.6416433Z + [[ default == \s\l\o\w ]] 2025-12-04T09:26:11.6416822Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *slow-gradcheck* ]] 2025-12-04T09:26:11.6417290Z + export PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 2025-12-04T09:26:11.6417619Z + PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 2025-12-04T09:26:11.6417928Z + export PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:26:11.6418257Z + PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:26:11.6418647Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *cuda* ]] 2025-12-04T09:26:11.6419177Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:26:11.6419517Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:26:11.6419814Z + [[ default == *crossref* ]] 2025-12-04T09:26:11.6420175Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *rocm* ]] 2025-12-04T09:26:11.6420789Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *xpu* ]] 2025-12-04T09:26:11.6421292Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *-bazel-* ]] 2025-12-04T09:26:11.6421705Z + pip_install ninja==1.10.2 2025-12-04T09:26:11.6422051Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-12-04T09:26:11.6422556Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-12-04T09:26:12.1251973Z Collecting ninja==1.10.2 2025-12-04T09:26:12.1500706Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-12-04T09:26:12.1861461Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-12-04T09:26:12.5967232Z Installing collected packages: ninja 2025-12-04T09:26:12.5967564Z Attempting uninstall: ninja 2025-12-04T09:26:12.5975027Z Found existing installation: ninja 1.11.1.4 2025-12-04T09:26:12.5999143Z Uninstalling ninja-1.11.1.4: 2025-12-04T09:26:12.6110508Z Successfully uninstalled ninja-1.11.1.4 2025-12-04T09:26:12.6863148Z Successfully installed ninja-1.10.2 2025-12-04T09:26:12.7453711Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:26:12.7455404Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:26:12.7456643Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *aarch64* ]] 2025-12-04T09:26:12.7457177Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *asan* ]] 2025-12-04T09:26:12.7457689Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *-debug* ]] 2025-12-04T09:26:12.7458207Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck != *-bazel-* ]] 2025-12-04T09:26:12.7458891Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck. Expect the assertion to pass' 2025-12-04T09:26:12.7459829Z We are not in debug mode: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck. Expect the assertion to pass 2025-12-04T09:26:12.7460371Z + cd test 2025-12-04T09:26:12.7460702Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-12-04T09:26:14.4177361Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-12-04T09:26:14.4177732Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-12-04T09:26:14.4178084Z + [[ default == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-12-04T09:26:14.4182310Z + DYNAMO_BENCHMARK_FLAGS=() 2025-12-04T09:26:14.4182998Z + [[ default == *pr_time_benchmarks* ]] 2025-12-04T09:26:14.4183359Z + [[ default == *dynamo_eager* ]] 2025-12-04T09:26:14.4183654Z + [[ default == *aot_eager* ]] 2025-12-04T09:26:14.4183927Z + [[ default == *aot_inductor* ]] 2025-12-04T09:26:14.4184223Z + [[ default == *max_autotune_inductor* ]] 2025-12-04T09:26:14.4184530Z + [[ default == *inductor* ]] 2025-12-04T09:26:14.4184789Z + [[ default == *dynamic* ]] 2025-12-04T09:26:14.4185061Z + [[ default == *cpu* ]] 2025-12-04T09:26:14.4185308Z + [[ default == *xpu* ]] 2025-12-04T09:26:14.4185579Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-12-04T09:26:14.4217187Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *libtorch* ]] 2025-12-04T09:26:14.4217740Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *-bazel-* ]] 2025-12-04T09:26:14.4221114Z + cd test 2025-12-04T09:26:14.4221695Z + python -c 'import torch; print(torch.__config__.show())' 2025-12-04T09:26:16.0802988Z PyTorch built with: 2025-12-04T09:26:16.0803281Z - GCC 11.4 2025-12-04T09:26:16.0803497Z - C++ Version: 201703 2025-12-04T09:26:16.0804095Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:26:16.0804823Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:26:16.0805258Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:26:16.0805588Z - LAPACK is enabled (usually provided by MKL) 2025-12-04T09:26:16.0806149Z - NNPACK is enabled 2025-12-04T09:26:16.0806403Z - CPU capability usage: AVX2 2025-12-04T09:26:16.0806684Z - CUDA Runtime 12.8 2025-12-04T09:26:16.0807023Z - NVCC architecture flags: -gencode;arch=compute_86,code=sm_86 2025-12-04T09:26:16.0807428Z - CuDNN 91.0.2 (built against CUDA 12.9) 2025-12-04T09:26:16.0812770Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32, CUDA_VERSION=12.8, CUDNN_VERSION=9.10.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-12-04T09:26:16.0818068Z 2025-12-04T09:26:16.4547183Z + cd test 2025-12-04T09:26:16.4547682Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-12-04T09:26:17.7750319Z ATen/Parallel: 2025-12-04T09:26:17.7750631Z at::get_num_threads() : 8 2025-12-04T09:26:17.7750913Z at::get_num_interop_threads() : 16 2025-12-04T09:26:17.7751220Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:26:17.7751507Z omp_get_max_threads() : 8 2025-12-04T09:26:17.7752097Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:26:17.7752690Z mkl_get_max_threads() : 8 2025-12-04T09:26:17.7753066Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:26:17.7753509Z std::thread::hardware_concurrency() : 16 2025-12-04T09:26:17.7753814Z Environment variables: 2025-12-04T09:26:17.7754411Z OMP_NUM_THREADS : [not set] 2025-12-04T09:26:17.7764440Z MKL_NUM_THREADS : [not set] 2025-12-04T09:26:17.7765119Z ATen parallel backend: OpenMP 2025-12-04T09:26:17.7765312Z 2025-12-04T09:26:18.1021468Z + [[ default == *numpy_2* ]] 2025-12-04T09:26:18.1021947Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *aarch64* ]] 2025-12-04T09:26:18.1022370Z + [[ default == *backward* ]] 2025-12-04T09:26:18.1022680Z + [[ default == *libtorch_agnostic_targetting* ]] 2025-12-04T09:26:18.1023017Z + [[ default == *xla* ]] 2025-12-04T09:26:18.1023265Z + [[ default == *vllm* ]] 2025-12-04T09:26:18.1023537Z + [[ default == *executorch* ]] 2025-12-04T09:26:18.1023815Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]] 2025-12-04T09:26:18.1024125Z + [[ default == \q\u\a\n\t\i\z\a\t\i\o\n ]] 2025-12-04T09:26:18.1024567Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *libtorch* ]] 2025-12-04T09:26:18.1025024Z + [[ default == distributed ]] 2025-12-04T09:26:18.1025307Z + [[ default == *operator_benchmark* ]] 2025-12-04T09:26:18.1025626Z + [[ default == *operator_microbenchmark* ]] 2025-12-04T09:26:18.1025975Z + [[ default == *attention_microbenchmark* ]] 2025-12-04T09:26:18.1026321Z + [[ default == *inductor_distributed* ]] 2025-12-04T09:26:18.1026629Z + [[ default == *inductor-halide* ]] 2025-12-04T09:26:18.1026930Z + [[ default == *inductor-pallas* ]] 2025-12-04T09:26:18.1027344Z + [[ default == *inductor-triton-cpu* ]] 2025-12-04T09:26:18.1027761Z + [[ default == *inductor-micro-benchmark* ]] 2025-12-04T09:26:18.1028112Z + [[ default == *aoti_cross_compile_for_windows* ]] 2025-12-04T09:26:18.1028824Z + [[ default == *huggingface* ]] 2025-12-04T09:26:18.1029099Z + [[ default == *timm* ]] 2025-12-04T09:26:18.1029348Z + [[ default == cachebench ]] 2025-12-04T09:26:18.1029623Z + [[ default == verify_cachebench ]] 2025-12-04T09:26:18.1029915Z + [[ default == *torchbench* ]] 2025-12-04T09:26:18.1030193Z + [[ default == *inductor_cpp_wrapper* ]] 2025-12-04T09:26:18.1030509Z + [[ default == *inductor_core* ]] 2025-12-04T09:26:18.1030790Z + [[ default == *inductor* ]] 2025-12-04T09:26:18.1031047Z + [[ default == *einops* ]] 2025-12-04T09:26:18.1031311Z + [[ default == *dynamo_core* ]] 2025-12-04T09:26:18.1031591Z + [[ default == *dynamo_wrapped* ]] 2025-12-04T09:26:18.1031972Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *rocm* ]] 2025-12-04T09:26:18.1032363Z + [[ 2 == 1 ]] 2025-12-04T09:26:18.1032563Z + [[ 2 == 2 ]] 2025-12-04T09:26:18.1032779Z + [[ 8 -gt 1 ]] 2025-12-04T09:26:18.1032985Z + install_torchvision 2025-12-04T09:26:18.1033225Z + local orig_preload 2025-12-04T09:26:18.1033459Z + local commit 2025-12-04T09:26:18.1033669Z ++ get_pinned_commit vision 2025-12-04T09:26:18.1033942Z ++ cat .github/ci_commit_pins/vision.txt 2025-12-04T09:26:18.1047740Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:26:18.1048071Z + orig_preload= 2025-12-04T09:26:18.1048283Z + '[' -n '' ']' 2025-12-04T09:26:18.1048600Z + [[ linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck == *cuda* ]] 2025-12-04T09:26:18.1049000Z + export FORCE_CUDA=1 2025-12-04T09:26:18.1049224Z + FORCE_CUDA=1 2025-12-04T09:26:18.1049440Z + export WITH_CUDA=1 2025-12-04T09:26:18.1049662Z + WITH_CUDA=1 2025-12-04T09:26:18.1050222Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision 2025-12-04T09:26:18.1051115Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:26:18.1051682Z + local wheel_dir=dist/vision 2025-12-04T09:26:18.1051940Z + local found_whl=0 2025-12-04T09:26:18.1052184Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:26:18.1052480Z + [[ -f dist/vision/*.whl ]] 2025-12-04T09:26:18.1052727Z + '[' 0 == 0 ']' 2025-12-04T09:26:18.1053402Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:26:18.4295568Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:26:18.4299528Z Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-p9mt7q5u 2025-12-04T09:26:18.4473952Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-p9mt7q5u 2025-12-04T09:26:20.2135383Z Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e' 2025-12-04T09:26:20.2160387Z Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:26:20.3515998Z Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:26:22.4736793Z Preparing metadata (pyproject.toml) ... [?25l- \ | done 2025-12-04T09:26:22.4769030Z [?25hBuilding wheels for collected packages: torchvision 2025-12-04T09:27:38.0803062Z Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-12-04T09:27:38.0834177Z [?25h Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl size=1786563 sha256=7874054a75ed282a987b4c93ab0d0596d77e962e2e31afef205dc3d79b7b2778 2025-12-04T09:27:38.0837721Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/12/b2/29/1f82685c5b5173629e1f36a9b93989ce92ce563e5fb91d27ac 2025-12-04T09:27:38.0873283Z Successfully built torchvision 2025-12-04T09:27:38.1980254Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:27:38.1980812Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:27:38.1981494Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl') 2025-12-04T09:27:38.1981941Z + local args 2025-12-04T09:27:38.1982328Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl == *\ * ]] 2025-12-04T09:27:38.1982816Z + for path in "${args[@]}" 2025-12-04T09:27:38.1983297Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl' 2025-12-04T09:27:38.1983991Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:27:38.1984787Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:27:38.5300094Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:27:38.5396402Z Installing collected packages: torchvision 2025-12-04T09:27:39.0106028Z Successfully installed torchvision-0.25.0a0+617079d 2025-12-04T09:27:39.0499352Z + '[' -n '' ']' 2025-12-04T09:27:39.0499628Z + test_python_shard 2 2025-12-04T09:27:39.0499936Z + [[ -z 8 ]] 2025-12-04T09:27:39.0500774Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 2 8 --verbose --upload-artifacts-while-running 2025-12-04T09:27:42.1611240Z Excluding doctests Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1615285Z Excluding test_meta Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1615996Z Excluding test_hub Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1616689Z Excluding test_fx Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1617400Z Excluding test_decomp Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1618213Z Excluding test_cpp_extensions_jit Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1619085Z Excluding test_jit Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1619829Z Excluding test_matmul_cuda Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1620565Z Excluding test_ops Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1621566Z Excluding test_ops_jit Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1622360Z Excluding dynamo/test_recompile_ux Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1623262Z Excluding inductor/test_compiled_optimizers Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1624188Z Excluding inductor/test_cutlass_backend Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1625077Z Excluding inductor/test_max_autotune Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1625975Z Excluding inductor/test_select_algorithm Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:42.1626840Z Excluding inductor/test_smoke Running in slow gradcheck mode, skipping tests that don't use gradcheck. 2025-12-04T09:27:44.1255196Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2025-12-04T09:27:44.1828264Z Ignoring disabled issues: [''] 2025-12-04T09:27:44.1929178Z Found test times from artifacts 2025-12-04T09:27:44.2331977Z Found test times from artifacts 2025-12-04T09:27:44.2345562Z Running all tests 2025-12-04T09:27:44.2978926Z Running parallel tests on 1 processes 2025-12-04T09:27:44.2985486Z Name: tests to run (est. time: 140.51min) 2025-12-04T09:27:44.2986159Z Serial tests (80): 2025-12-04T09:27:44.2986404Z inductor/test_aot_inductor 2/5 2025-12-04T09:27:44.2986763Z inductor/test_torchinductor_codegen_dynamic_shapes 1/4 2025-12-04T09:27:44.2987158Z inductor/test_torchinductor_opinfo 4/14 2025-12-04T09:27:44.2987493Z inductor/test_torchinductor_opinfo 12/14 2025-12-04T09:27:44.2987817Z inductor/test_flex_attention 6/6 2025-12-04T09:27:44.2988105Z inductor/test_fp8 1/1 2025-12-04T09:27:44.2988364Z dynamo/test_model_output 1/1 2025-12-04T09:27:44.2988644Z inductor/test_triton_kernels 1/1 2025-12-04T09:27:44.2988953Z inductor/test_loop_ordering 1/1 2025-12-04T09:27:44.2989241Z export/test_serdes 1/1 2025-12-04T09:27:44.2989526Z inductor/test_scatter_optimization 1/1 2025-12-04T09:27:44.2989841Z inductor/test_padding 1/1 2025-12-04T09:27:44.2990114Z dynamo/test_callback 1/1 2025-12-04T09:27:44.2990386Z inductor/test_custom_op_autotune 1/1 2025-12-04T09:27:44.2990679Z test_cuda 1/1 2025-12-04T09:27:44.2990896Z test_sparse 1/1 2025-12-04T09:27:44.2991150Z test_ci_sanity_check_fail 1/1 2025-12-04T09:27:44.2991525Z test_ops_fwd_gradients 6/12 2025-12-04T09:27:44.2991793Z test_ops_gradients 2/10 2025-12-04T09:27:44.2992102Z test_ops_gradients 10/10 2025-12-04T09:27:44.2992401Z functorch/test_ops 3/6 2025-12-04T09:27:44.2992654Z dynamo/test_after_aot 1/1 2025-12-04T09:27:44.2992970Z inductor/test_snode_runtime 1/1 2025-12-04T09:27:44.2993395Z inductor/test_compiled_autograd 1/2 2025-12-04T09:27:44.2993736Z test_testing 1/1 2025-12-04T09:27:44.2993986Z inductor/test_autoheuristic 1/1 2025-12-04T09:27:44.2994287Z inductor/test_cutedsl_template 1/1 2025-12-04T09:27:44.2994593Z inductor/test_benchmark_fusion 1/1 2025-12-04T09:27:44.2994886Z inductor/test_remote_cache 1/1 2025-12-04T09:27:44.2995194Z inductor/test_coordinate_descent_tuner 1/1 2025-12-04T09:27:44.2995621Z inductor/test_inplace_padding 1/1 2025-12-04T09:27:44.2995921Z inductor/test_cudacodecache 1/1 2025-12-04T09:27:44.2996301Z inductor/test_minifier_utils 1/1 2025-12-04T09:27:44.2996596Z inductor/test_debug_trace 1/1 2025-12-04T09:27:44.2996929Z export/test_tree_utils 1/1 2025-12-04T09:27:44.2997307Z inductor/test_triton_wrapper 1/1 2025-12-04T09:27:44.2997640Z inductor/test_static_cuda_launcher 1/1 2025-12-04T09:27:44.2997964Z inductor/test_provenance_tracing 1/1 2025-12-04T09:27:44.2998277Z inductor/test_memory_planning 1/1 2025-12-04T09:27:44.2998581Z export/test_cpp_serdes 1/1 2025-12-04T09:27:44.2998853Z inductor/test_control_flow 2/4 2025-12-04T09:27:44.2999324Z test_sort_and_select 1/1 2025-12-04T09:27:44.2999598Z functorch/test_rearrange 1/1 2025-12-04T09:27:44.2999908Z test_package 1/1 2025-12-04T09:27:44.3000150Z test_mkl_verbose 1/1 2025-12-04T09:27:44.3000400Z test_utils_config_module 1/1 2025-12-04T09:27:44.3000663Z test_hop_infra 1/1 2025-12-04T09:27:44.3000915Z test_appending_byte_serializer 1/1 2025-12-04T09:27:44.3001215Z test_ao_sparsity 1/1 2025-12-04T09:27:44.3001462Z test_extension_utils 1/1 2025-12-04T09:27:44.3001723Z nn/attention/test_fa4 1/1 2025-12-04T09:27:44.3001991Z typing/test_python_operators 1/1 2025-12-04T09:27:44.3002281Z torch_np/test_dtype 1/1 2025-12-04T09:27:44.3002531Z test_file_check 1/1 2025-12-04T09:27:44.3002767Z profiler/test_kineto 1/1 2025-12-04T09:27:44.3003034Z functorch/test_ac_knapsack 1/1 2025-12-04T09:27:44.3003330Z torch_np/test_nep50_examples 1/1 2025-12-04T09:27:44.3003598Z test_torch 1/1 2025-12-04T09:27:44.3003824Z xpu/test_gemm 1/1 2025-12-04T09:27:44.3004060Z test_binary_ufuncs 1/1 2025-12-04T09:27:44.3004297Z test_modules 2/4 2025-12-04T09:27:44.3004560Z torch_np/numpy_tests/linalg/test_linalg 1/1 2025-12-04T09:27:44.3004906Z torch_np/numpy_tests/core/test_dtype 1/1 2025-12-04T09:27:44.3005208Z lazy/test_debug_util 1/1 2025-12-04T09:27:44.3005467Z nn/test_load_state_dict 1/1 2025-12-04T09:27:44.3005723Z test_shape_ops 1/1 2025-12-04T09:27:44.3006065Z profiler/test_memory_profiler 1/1 2025-12-04T09:27:44.3006344Z test_indexing 1/1 2025-12-04T09:27:44.3006568Z test_type_info 1/1 2025-12-04T09:27:44.3006812Z functorch/test_aotdispatch 1/1 2025-12-04T09:27:44.3007092Z test_scatter_gather_ops 1/1 2025-12-04T09:27:44.3007362Z test_cuda_multigpu 1/1 2025-12-04T09:27:44.3007652Z torch_np/numpy_tests/lib/test_index_tricks 1/1 2025-12-04T09:27:44.3008302Z test_jit_autocast 1/1 2025-12-04T09:27:44.3008558Z test_xnnpack_integration 1/1 2025-12-04T09:27:44.3008828Z nn/test_init 1/1 2025-12-04T09:27:44.3009054Z test_mobile_optimizer 1/1 2025-12-04T09:27:44.3009324Z test_type_promotion 1/1 2025-12-04T09:27:44.3009587Z test_reductions 1/1 2025-12-04T09:27:44.3009859Z test_autoload_disable 1/1 2025-12-04T09:27:44.3010117Z Parallel tests (0): 2025-12-04T09:27:44.3010362Z Name: excluded (est. time: 0.0min) 2025-12-04T09:27:44.3010627Z Serial tests (0): 2025-12-04T09:27:44.3010855Z Parallel tests (0): 2025-12-04T09:27:44.3011258Z Running inductor/test_aot_inductor 2/5 ... [2025-12-04 09:27:44.299075][936.308299477] 2025-12-04T09:27:44.3011734Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:27:44.3012791Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=2', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:27:44.299499] 2025-12-04T09:36:10.8931079Z 2025-12-04T09:36:10.8941419Z inductor/test_aot_inductor 2/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_2.5_ac1d7e2a37fbed81_.log 2025-12-04T09:36:10.9035970Z Running 184 items in this shard: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_boolean_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_3_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_4_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_composed_dynamic_size_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_cpu_predicate_cuda_operands_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_disable_one_pass_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_multiple_outputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_convolution_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicated_params_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dynamic_scalar_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_foreach_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_inf_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_dynamic_dim_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_no_args_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_on_gpu_device1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_abs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_permute_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_squeeze_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_interleave_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replicate_on_devices_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_large_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_dot_product_efficient_attention_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_embed_kernel_binary_False_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_from_multi_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_with_unbacked_add_and_mul_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_sympy_fn_like_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aliased_buffer_reuse_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_amp_fallback_random_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_cpp_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_fp8_dtype_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_int64_user_defined_triton_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_bool_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_type_propagation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dup_unbacked_sym_decl_with_refinement_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_scalar_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_embedding_bag_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fake_tensor_device_validation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fill__fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_index_put_with_none_index_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_inf_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_mmaped_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_on_gpu_device1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quanatized_int8_linear_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_replicate_on_devices_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_device_type_failed_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_dot_product_efficient_attention_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scatter_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_shifted_constraint_ranges_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_embed_kernel_binary_True_max_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_so_without_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_dynamic_launcher_grid_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_dynamic_shape_with_div_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_expr_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_fn_like_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_with_none_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_view_outputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_cudagraphs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_name_collision_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_user_defined_triton_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_profiler_enable_kernel_profile_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_runtime_asserts_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_int64_user_defined_triton_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_non_tensor_predicates_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_predicate_on_cpu_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_share_predicate_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_outer_code_before_after_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_copy_non_blocking_is_pinned_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dup_unbacked_sym_decl_with_refinement_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_foreach_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_dynamic_dim_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_mmaped_weights_on_disk_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_masked_select_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misaligned_input_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_cubin_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_mixed_device_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_multiple_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_default_gpu_device_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_tensor_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_device_type_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_shape_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_same_backing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_True_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_so_without_weight_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_expr_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symbool_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_fn_like_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_weird_param_order_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_next_power_of_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_constant_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_using_model_name_for_files_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_buffers_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_True_mps 2025-12-04T09:36:10.9129617Z 2025-12-04T09:36:10.9129927Z Finished inductor/test_aot_inductor 2/5 ... [2025-12-04 09:36:10.892539][1442.901763277], took 8.44min 2025-12-04T09:36:10.9131055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d2163ec8f4306bf7.xml 2025-12-04T09:36:11.3136341Z Uploading artifacts took 0.11 seconds 2025-12-04T09:36:11.3139699Z Running inductor/test_torchinductor_codegen_dynamic_shapes 1/4 ... [2025-12-04 09:36:11.313643][1443.322864938] 2025-12-04T09:36:11.3140310Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:36:11.3144405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_codegen_dynamic_shapes.py', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:36:11.314078] 2025-12-04T09:44:42.2450687Z 2025-12-04T09:44:42.2452258Z inductor/test_torchinductor_codegen_dynamic_shapes 1/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_1.4_295ecc74e041d7f8_.log 2025-12-04T09:44:42.2727071Z Running 440 items in this shard: test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_matmul_4bit_bf16_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_avg_pool1d_argmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex7_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_const_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_with_scalar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_as_strided_on_views_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_as_strided_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_assert_alignment_op_name_fail_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_assert_alignment_op_name_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_assert_size_stride_op_name_fail_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_assert_size_stride_op_name_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool3d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_batch_norm_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_negative_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_unbacked_empty_1d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_upcasting_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_compar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_config_option_dont_assume_alignment_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_const_int32_to_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_channels_last_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_functional_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_scalar_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_tensor_with_cpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_tensor_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cudnn_rnn_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_op_fixed_layout_sequential_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_data_type_propogation_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_device_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div9_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div_by_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_bfloat16_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_bfloat16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_embedding_bag_byte_unpack_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_embedding_bag_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_exp2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_exp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_expand_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fallback_mutable_op_list_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fft_real_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fmin_fmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_full_like_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_functionalize_rng_wrappers_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gather2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_generated_code_has_alignment_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gpu_scalar_with_cpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gpu_scalar_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_failed_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inner_fn_str_and_stride_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_int8_weight_only_quant_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_invalid_operand_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_isinf2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lgamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_like_rands_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linspace3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linspace4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logcumsumexp_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_long_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_masked_fill_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mean_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_misaligned_address_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mix_device_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_move_arange_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multi_gpu_recompile_on_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multi_threading_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_assert_inside_triton_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_False_descending_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_to_num_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_narrow_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_no_op_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_no_specization_over_symbolic_value_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_permute1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_philox_rand_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pixel_shuffle_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_airy_ai_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_bessel_y1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_entr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_gammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_log_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_k1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_ndtri_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_scaled_modified_bessel_k0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_t_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_w_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_polar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_prod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randint_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randint_int64_mod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reflection_pad2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reinterpret_dtypeview_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_view_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_replication_pad_errors_with_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_require_stride_expanded_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_roll_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scalar_cpu_tensor_arg_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_add2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_select_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_shape_padding_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_signbit_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_mutation3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_transpose_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumprod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumprod_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumsum_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_failed_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_reduction_with_int64_size_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_with_list_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_with_sizes_with_unbacked_symints_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_with_unbacked_symints_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_stack_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_std_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_strided_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_tensor_index_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_topk_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_triton_argmin_argmax_transpose_logical_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_triton_kernel_bool_param_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_uint4x2_mixed_mm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unroll_small_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unsigned_constant_tensors_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_nearest2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_vdd_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_vectorized_ops_masked_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_view_on_aliased_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_zero_element_mutation_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool2d_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_const_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_const_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_addmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aliased_buffer_reuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_allow_reuse_active_if_under_peak_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_override_registration_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_with_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_as_strided_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_assert_alignment_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_assert_size_stride_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_assert_size_stride_op_name_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d_backward3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool3d_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_batch_norm_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bernoulli2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_computed_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int16_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_uint8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_buffer_copied_in_graph_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_empty_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_upcasting_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_compar_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_consecutive_split_cumprod_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_const_int32_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_2d_strides_nonpositive_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_fill_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv1d_depthwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_functional_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_convolution3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_with_scalar_src_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cudnn_rnn_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_no_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_default_layout_constraint_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_data_type_propogation_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_device_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div9_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div_precision_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dont_constant_fold_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dropout3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dropout_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_bfloat16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_embedding_bag_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exact_stride_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_expand_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_expand_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fill2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_flip_cat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float32_to_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float_repr_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fmod_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fractional_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_full_like_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_gather1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_gather_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_glu_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_arange2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_constant_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_mutation_real_name_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_pad_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_grid_sampler_expand_preserves_view_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_float_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_indirect_load_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inductor_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_resize_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_where_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_insignificant_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_int8_weight_only_quant_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_offset_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_leaky_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linalg_eig_stride_consistency_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear_dynamic_maxautotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linspace1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linspace4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_mode_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_mode_not_decompose_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log_fp64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_logsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_masked_fill_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_min_max_reduction_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_misaligned_address_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mixed_mm2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mixed_mm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_move_arange_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_prime_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_False_descending_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_True_descending_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_narrow_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_neg_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_new_ones_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_no_op_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nonzero_unbacked_refinement_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pad_view_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pattern_matcher_unbacked_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_philox_rand_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_bessel_y1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_digamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_entr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_i0e_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_legendre_polynomial_p_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_ndtr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_ndtri_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_profiler_mark_wrapper_call_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_distribution_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_int64_mod_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randint_kernel_count_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scaled_dot_product_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_reduce2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_reduce3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_searchsorted_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_signbit_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_simplify_loops_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_view_with_graph_break_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_one_kernel_loop_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_special_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_reduction_with_int64_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_with_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_squeeze_varargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_stack_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_std_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tan_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_to_device_constant_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_transposed_propagates_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unbacked_floordiv_simplify_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsigned_constant_tensors_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_bilinear2d_b_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_nearest1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_nearest2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_nearest3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_var_mean_div_by_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_var_mean_tile_reduction_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vdd_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_weight_norm_conv2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_where_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_zero_element_mutation_dynamic_shapes_cuda 2025-12-04T09:44:42.2994833Z 2025-12-04T09:44:42.2995289Z Finished inductor/test_torchinductor_codegen_dynamic_shapes 1/4 ... [2025-12-04 09:44:42.245756][1954.254979915], took 8.52min 2025-12-04T09:44:42.2996757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-7dfb99a0e36ebc6b.xml 2025-12-04T09:44:42.3346945Z Running inductor/test_torchinductor_opinfo 4/14 ... [2025-12-04 09:44:42.334287][1954.343507072] 2025-12-04T09:44:42.3347516Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:44:42.3350251Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=4', '--num-shards=14', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:44:42.334645] 2025-12-04T09:55:07.4875911Z 2025-12-04T09:55:07.4879303Z inductor/test_torchinductor_opinfo 4/14 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_4.14_2b71ae42f7581618_.log 2025-12-04T09:55:07.5058708Z Running 246 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_lengths_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_householder_product_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_multinomial_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_kl_div_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_bool 2025-12-04T09:55:07.5213908Z 2025-12-04T09:55:07.5214280Z Finished inductor/test_torchinductor_opinfo 4/14 ... [2025-12-04 09:55:07.488269][2579.497491388], took 10.42min 2025-12-04T09:55:07.5215600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f45bd9366a90530e.xml 2025-12-04T09:55:07.5735987Z Running inductor/test_torchinductor_opinfo 12/14 ... [2025-12-04 09:55:07.573132][2579.582351573] 2025-12-04T09:55:07.5736738Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:55:07.5740027Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=12', '--num-shards=14', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:55:07.573533] 2025-12-04T10:05:26.2498727Z 2025-12-04T10:05:26.2499852Z inductor/test_torchinductor_opinfo 12/14 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_12.14_f1debdb3c47cb0ae_.log 2025-12-04T10:05:26.2638404Z Running 257 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_not_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_right_shift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cauchy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_einsum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gradient_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorsolve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanquantile_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hann_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_int32 2025-12-04T10:05:26.2770553Z 2025-12-04T10:05:26.2770915Z Finished inductor/test_torchinductor_opinfo 12/14 ... [2025-12-04 10:05:26.250225][3198.259448465], took 10.31min 2025-12-04T10:05:26.2772344Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-85306c1f70284b1c.xml 2025-12-04T10:05:26.5353596Z Uploading artifacts took 0.20 seconds 2025-12-04T10:05:26.5357542Z Running inductor/test_flex_attention 6/6 ... [2025-12-04 10:05:26.535409][3198.544632294] 2025-12-04T10:05:26.5358041Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:05:26.5361891Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_attention.py', '--shard-id=6', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:05:26.535751] 2025-12-04T10:15:31.7161279Z 2025-12-04T10:15:31.7162504Z inductor/test_flex_attention 6/6 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_attention_6.6_cafbaa2a62098057_.log 2025-12-04T10:15:31.7242586Z Running 141 items in this shard: test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod3_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod4_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_aot_eager_gradcheck_score_mod5_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_autograd_function_in_score_mod_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_backend_triton_decode_errors_with_non_power_of_two_gqa_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod1_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_seqlen_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod5_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod7_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_cant_lower_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_buffers_all_dims_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_score_mod_aot_eager_gradcheck_score_mod_name__head_offset_mode_aot_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_captured_wrong_device_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_custom_score_mod_layout_freeze_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dependent_causal_bidirectional_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_eager_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_fully_masked_out_rows_0_check_compile_False_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_function_composition_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_index_weird1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims1_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_load_from_view_buffer_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_only_return_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_lse_masked_output_backend_flex_decode_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_max_autotune_with_captured_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_mixed_device_error_message_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_mixed_dtypes_fails_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_multiple_score_mod_calls2_paged_attention_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_new_empty_mask_mod_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_njt_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod4_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod5_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_aux__rel_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_aux__times_two_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_aux_deprecation_warnings_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_max__causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_selective_ac_ops_to_save0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_selective_ac_with_max_autotune_short_query_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_skip_odd_keys_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_small_block_mask_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s3_v_s3_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s1_v_s1_do_s1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_tma_with_customer_kernel_options_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_validate_small_embedding_size_error_message_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod1_cuda_float32, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_doc_mask_clamped_repro_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_forward_pass_with_none_q_indices_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_from_kv_blocks_without_q_computation_full_indices_False_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_getitem_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_init_mismatched_full_q_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_upcast_appropriately_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_comparison_vs_sdpa_with_learnable_bias_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_indirect_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float16_cuda 2025-12-04T10:15:31.7319650Z 2025-12-04T10:15:31.7319977Z Finished inductor/test_flex_attention 6/6 ... [2025-12-04 10:15:31.715727][3803.724951519], took 10.09min 2025-12-04T10:15:31.7321147Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-e8dc2e2d2922989b.xml 2025-12-04T10:15:31.8331962Z Running inductor/test_fp8 1/1 ... [2025-12-04 10:15:31.832810][3803.842032892] 2025-12-04T10:15:31.8332411Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:31.8338856Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:31.833158] 2025-12-04T10:35:19.5664334Z 2025-12-04T10:35:19.5665486Z PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_440b1865b73f9802_.log) 2025-12-04T10:35:19.5667019Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.xml 2025-12-04T10:35:19.5668577Z ============================= test session starts ============================== 2025-12-04T10:35:19.5669359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.5670041Z cachedir: .pytest_cache 2025-12-04T10:35:19.5670897Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.5671763Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.5672152Z configfile: pytest.ini 2025-12-04T10:35:19.5672904Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.5673794Z collecting ... collected 188 items 2025-12-04T10:35:19.5674273Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:35:19.5816004Z Running 188 items in this shard: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda 2025-12-04T10:35:19.5931874Z 2025-12-04T10:35:19.5933139Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.5935707Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.5937014Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.5937874Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.5938982Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.5940047Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.5941018Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.5942247Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.5943410Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.5944522Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.5945918Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.5946998Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.5947934Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.5948981Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.5949971Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.5950857Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.5952152Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.5953353Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.5954438Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.5955501Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.5956536Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.5957632Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.5958776Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.5959868Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.5960812Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.5961721Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.5962714Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.5963693Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.5964671Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.5965798Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.5966973Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.5968432Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.5969562Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.5971898Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.5974911Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.5976613Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.5978562Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.5980614Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.5982518Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.5984407Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.5986202Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.5987665Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.5989577Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.5991150Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.5992683Z E1204 10:15:39.059000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.5993730Z ('RERUN', {'yellow': True}) [1.6928s] [ 0%] 2025-12-04T10:35:19.5995304Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.5997692Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.5998991Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.5999995Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.6001212Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.6002431Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.6003684Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.6004981Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.6006134Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.6007309Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.6008912Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.6009903Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.6011098Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.6012503Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.6013729Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.6014886Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.6016335Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.6017657Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.6018739Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.6019914Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.6021277Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.6022693Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.6024189Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.6025500Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.6026735Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.6027935Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.6029152Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.6030219Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.6031698Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.6032959Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.6034214Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.6035841Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.6037163Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.6039798Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.6042466Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.6044158Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6045756Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6047208Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6048781Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6050234Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6051896Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6053271Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.6054722Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6056013Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.6057216Z E1204 10:15:39.356000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6058227Z ('RERUN', {'yellow': True}) [0.2632s] [ 0%] 2025-12-04T10:35:19.6059797Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.6061775Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6063063Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.6063918Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.6064855Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.6065807Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.6066777Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.6067809Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.6069099Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.6070374Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.6071460Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.6103655Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.6104755Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.6106043Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.6107214Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.6108551Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.6109913Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.6111350Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.6112648Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.6113962Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.6115296Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.6116744Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.6118193Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.6119572Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.6120810Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.6122270Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.6123535Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.6124770Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.6126027Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.6127393Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.6128745Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.6130320Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.6131627Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.6134449Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.6139286Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.6141200Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6143086Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6144895Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6146654Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6148401Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6150222Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6151838Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.6153605Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6155218Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.6156898Z E1204 10:15:39.619000 73602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6158125Z FAILED [0.2612s] [ 0%] 2025-12-04T10:35:19.6158326Z 2025-12-04T10:35:19.6158492Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.6159191Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.6159726Z Traceback (most recent call last): 2025-12-04T10:35:19.6160348Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.6161181Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.6162090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.6163057Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.6164090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.6165031Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.6165799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.6166817Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.6167734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.6172808Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.6173930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.6174858Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.6175684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.6176514Z return self._compile_to_module() 2025-12-04T10:35:19.6177338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.6178129Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.6179154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.6179959Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.6180840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.6181847Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.6182960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.6183889Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.6184734Z File "/tmp/tmpzcx134wn/6y/c6yly3762l5dpq4zpvhebgpc534lzyzc6pj2sy42ui4qzgnlb6o4.py", line 62, in 2025-12-04T10:35:19.6186028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.6186843Z kernel.precompile( 2025-12-04T10:35:19.6187700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.6188641Z self._precompile_worker() 2025-12-04T10:35:19.6189511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.6190375Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.6191624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6192696Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6193494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6194383Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6195326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6196270Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6197061Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6198040Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6198858Z ^ 2025-12-04T10:35:19.6199516Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6200178Z 2025-12-04T10:35:19.6200985Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6201978Z 2025-12-04T10:35:19.6201984Z 2025-12-04T10:35:19.6202374Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6203645Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6204672Z 2025-12-04T10:35:19.6204987Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6205702Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6206229Z frames [('total', 1)] 2025-12-04T10:35:19.6206578Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6207141Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6208010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6208578Z graph_break [] 2025-12-04T10:35:19.6209109Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.6209774Z Traceback (most recent call last): 2025-12-04T10:35:19.6210544Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.6211460Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.6212413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.6213401Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.6214388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.6215350Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.6216292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.6217270Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.6218415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.6219919Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.6221129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.6222075Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.6223310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.6224390Z return self._compile_to_module() 2025-12-04T10:35:19.6225365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.6226452Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.6227534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.6228617Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.6229536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.6230768Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.6232041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.6233112Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.6234194Z File "/tmp/tmpll1ds1ip/c4/cc4bj6oy5lpgnnzsganqw2wyma3jehcszsa6t5lh5n4zrraq2lfh.py", line 62, in 2025-12-04T10:35:19.6235508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.6236434Z kernel.precompile( 2025-12-04T10:35:19.6237447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.6238637Z self._precompile_worker() 2025-12-04T10:35:19.6239823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.6241065Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.6242141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6243145Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6243993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6244920Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6245993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6247086Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6247986Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6249149Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6250226Z ^ 2025-12-04T10:35:19.6250991Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6251896Z 2025-12-04T10:35:19.6252753Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6253786Z 2025-12-04T10:35:19.6253792Z 2025-12-04T10:35:19.6254173Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6255387Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6256219Z 2025-12-04T10:35:19.6256712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6257427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6258192Z frames [('total', 1)] 2025-12-04T10:35:19.6258764Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6259630Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6260483Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6261135Z graph_break [] 2025-12-04T10:35:19.6261545Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6262041Z frames [('total', 1)] 2025-12-04T10:35:19.6262469Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6263021Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6263617Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6264167Z graph_break [] 2025-12-04T10:35:19.6264521Z =================================== FAILURES =================================== 2025-12-04T10:35:19.6265251Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.6265944Z Traceback (most recent call last): 2025-12-04T10:35:19.6266952Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.6268034Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.6269200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.6270362Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.6271582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.6272819Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.6274018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.6274997Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.6276160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.6277396Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.6278729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.6279707Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.6280722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.6281872Z return self._compile_to_module() 2025-12-04T10:35:19.6282755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.6283770Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.6284860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.6285957Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.6286821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.6288022Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.6289271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.6290373Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.6291373Z File "/tmp/tmplem41v0m/hn/chnmbhr5xpr4zjplom7zth6gayoaam2rivjf2al6d2mzyjwimps5.py", line 62, in 2025-12-04T10:35:19.6292760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.6293835Z kernel.precompile( 2025-12-04T10:35:19.6294881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.6296029Z self._precompile_worker() 2025-12-04T10:35:19.6297176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.6298317Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.6299705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6300727Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6301884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6302962Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6304002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6305179Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6306230Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6307359Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6308609Z ^ 2025-12-04T10:35:19.6309378Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6310457Z 2025-12-04T10:35:19.6311319Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6312354Z 2025-12-04T10:35:19.6312360Z 2025-12-04T10:35:19.6312622Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6314188Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6315293Z 2025-12-04T10:35:19.6315717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6316481Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6317311Z frames [('total', 1)] 2025-12-04T10:35:19.6317765Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6318476Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6319286Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6393533Z graph_break [] 2025-12-04T10:35:19.6394022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6394606Z frames [('total', 1)] 2025-12-04T10:35:19.6394986Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6395512Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6399219Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6399644Z graph_break [] 2025-12-04T10:35:19.6399992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6400485Z frames [('total', 1)] 2025-12-04T10:35:19.6400747Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6401236Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6401949Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6402479Z graph_break [] 2025-12-04T10:35:19.6403372Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.xml - 2025-12-04T10:35:19.6404436Z =========================== short test summary info ============================ 2025-12-04T10:35:19.6406073Z FAILED [0.2612s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6407654Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6408747Z ^ 2025-12-04T10:35:19.6409405Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6410063Z 2025-12-04T10:35:19.6410825Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6411747Z 2025-12-04T10:35:19.6411753Z 2025-12-04T10:35:19.6411992Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6413249Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6414176Z 2025-12-04T10:35:19.6414450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6415035Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.6415503Z ========================== 1 failed, 2 rerun in 2.25s ========================== 2025-12-04T10:35:19.6415899Z Got exit code 1 2025-12-04T10:35:19.6416114Z Retrying single test... 2025-12-04T10:35:19.6416994Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.xml 2025-12-04T10:35:19.6417723Z ============================= test session starts ============================== 2025-12-04T10:35:19.6418460Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.6419225Z cachedir: .pytest_cache 2025-12-04T10:35:19.6420018Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.6420905Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.6421284Z configfile: pytest.ini 2025-12-04T10:35:19.6422030Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.6422855Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.6423824Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6424649Z Running 1 items in this shard 2025-12-04T10:35:19.6424832Z 2025-12-04T10:35:19.6425900Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.6427897Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6429192Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.6430036Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.6430957Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.6431895Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.6432852Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.6434040Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.6435202Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.6436423Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.6437859Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.6439047Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.6440230Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.6441222Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.6442125Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.6443003Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.6444034Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.6445256Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.6446437Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.6447559Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.6448584Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.6449677Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.6450807Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.6451887Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.6452826Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.6453709Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.6454687Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.6455656Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.6456625Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.6457675Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.6458848Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.6460357Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.6461380Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.6463590Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.6466307Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.6467885Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6469419Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6470929Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6472382Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6473829Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6475378Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6476678Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.6478134Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6479367Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.6480557Z E1204 10:15:49.686000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6481538Z ('RERUN', {'yellow': True}) [1.6944s] [100%] 2025-12-04T10:35:19.6482814Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.6484784Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6486067Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.6487045Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.6487973Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.6488919Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.6489889Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.6490927Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.6491991Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.6493091Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.6494191Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.6495156Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.6496078Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.6497126Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.6498027Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.6498899Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.6500127Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.6501228Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.6502243Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.6503246Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.6504268Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.6505388Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.6506549Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.6507628Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.6508916Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.6509889Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.6510932Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.6511986Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.6513097Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.6514151Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.6515318Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.6516632Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.6517649Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.6519861Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.6522197Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.6523762Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6525295Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6526709Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6528152Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6529594Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6531111Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6532403Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.6533843Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6535074Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.6536305Z E1204 10:15:49.981000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6537289Z ('RERUN', {'yellow': True}) [0.2623s] [100%] 2025-12-04T10:35:19.6538649Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.6540693Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6541985Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.6542833Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.6543755Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.6544693Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.6545680Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.6546728Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.6547799Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.6548897Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.6550075Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.6551031Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.6551950Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.6552914Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.6553818Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.6554693Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.6555728Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.6556831Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.6557845Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.6558843Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.6559867Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.6560962Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.6562093Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.6563163Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.6564096Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.6565067Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.6566043Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.6567013Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.6567980Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.6569037Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.6570208Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.6571527Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.6572539Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.6574740Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.6577242Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.6578703Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6580335Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6581749Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6583193Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6584630Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6586194Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6587481Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.6588932Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6590165Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.6591438Z E1204 10:15:50.245000 73783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6592396Z FAILED [0.2619s] [100%] 2025-12-04T10:35:19.6592548Z 2025-12-04T10:35:19.6592667Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.6593182Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.6593670Z Traceback (most recent call last): 2025-12-04T10:35:19.6594250Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.6595043Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.6595826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.6596573Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.6597335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.6598048Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.6598746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.6599506Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.6600192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.6601041Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.6601923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.6602648Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.6603337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.6603964Z return self._compile_to_module() 2025-12-04T10:35:19.6604570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.6605236Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.6605934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.6606596Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.6607231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.6608189Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.6609010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.6609727Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.6610352Z File "/tmp/tmppwu8nk_3/4m/c4mqblwv37do654itola2doyw47mjtysh4z2t4si662nnp7a4ado.py", line 62, in 2025-12-04T10:35:19.6611284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.6611889Z kernel.precompile( 2025-12-04T10:35:19.6612522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.6613218Z self._precompile_worker() 2025-12-04T10:35:19.6613911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.6614685Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.6615606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6616414Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6617089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6617801Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6618514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6619342Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6619947Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6620689Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6621321Z ^ 2025-12-04T10:35:19.6621827Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6622333Z 2025-12-04T10:35:19.6622942Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6623680Z 2025-12-04T10:35:19.6623685Z 2025-12-04T10:35:19.6623867Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6624965Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6625762Z 2025-12-04T10:35:19.6625990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6626516Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6626899Z frames [('total', 1)] 2025-12-04T10:35:19.6627146Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6627525Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6628019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6628466Z graph_break [] 2025-12-04T10:35:19.6628907Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.6629452Z Traceback (most recent call last): 2025-12-04T10:35:19.6630035Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.6630737Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.6631470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.6632264Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.6633033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.6633757Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.6634470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.6635140Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.6635836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.6636686Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.6637523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.6638203Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.6638959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.6639589Z return self._compile_to_module() 2025-12-04T10:35:19.6640194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.6640869Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.6641557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.6642235Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.6642866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.6643602Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.6644421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.6645147Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.6645805Z File "/tmp/tmpwr23i0hc/hh/chhcfwf5wj6yh4jwa4injyyvcfmhtvghjfm5pjquinmgloh6xgzy.py", line 62, in 2025-12-04T10:35:19.6646767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.6647370Z kernel.precompile( 2025-12-04T10:35:19.6648006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.6648782Z self._precompile_worker() 2025-12-04T10:35:19.6649477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.6650262Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.6651027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6651830Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6652500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6653215Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6653927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6654732Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6655339Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6656127Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6656765Z ^ 2025-12-04T10:35:19.6657273Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6657787Z 2025-12-04T10:35:19.6658418Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6659184Z 2025-12-04T10:35:19.6659188Z 2025-12-04T10:35:19.6659389Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6660379Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6661189Z 2025-12-04T10:35:19.6661425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6661968Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6662365Z frames [('total', 1)] 2025-12-04T10:35:19.6662605Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6662999Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6663606Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6663990Z graph_break [] 2025-12-04T10:35:19.6664312Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6664711Z frames [('total', 1)] 2025-12-04T10:35:19.6664946Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6665329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6665908Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6666318Z graph_break [] 2025-12-04T10:35:19.6666559Z =================================== FAILURES =================================== 2025-12-04T10:35:19.6667073Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.6667575Z Traceback (most recent call last): 2025-12-04T10:35:19.6668164Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.6668903Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.6669669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.6670429Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.6671213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.6672057Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.6695170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.6695922Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.6696635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.6697508Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.6698357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.6699119Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.6699779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.6700425Z return self._compile_to_module() 2025-12-04T10:35:19.6701039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.6701702Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.6702390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.6703062Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.6703703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.6704441Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.6705258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.6705986Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.6706626Z File "/tmp/tmpd80y832x/23/c23whko23vhdgy2z4vyb3poajgcgorw2xfm2hs6apsy2gu2ag65k.py", line 62, in 2025-12-04T10:35:19.6707562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.6708479Z kernel.precompile( 2025-12-04T10:35:19.6709112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.6709800Z self._precompile_worker() 2025-12-04T10:35:19.6710657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.6711435Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.6712201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6713008Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6713681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6714386Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6715075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6715853Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6716459Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6717197Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6717815Z ^ 2025-12-04T10:35:19.6718308Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6718939Z 2025-12-04T10:35:19.6719551Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6720272Z 2025-12-04T10:35:19.6720276Z 2025-12-04T10:35:19.6720462Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6721426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6722226Z 2025-12-04T10:35:19.6722452Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6722977Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6723361Z frames [('total', 1)] 2025-12-04T10:35:19.6723598Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6723972Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6724477Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6724859Z graph_break [] 2025-12-04T10:35:19.6725172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6725603Z frames [('total', 1)] 2025-12-04T10:35:19.6725835Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6726200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6726701Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6727100Z graph_break [] 2025-12-04T10:35:19.6727398Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6727773Z frames [('total', 1)] 2025-12-04T10:35:19.6728010Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6728365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6728854Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6729261Z graph_break [] 2025-12-04T10:35:19.6729943Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.xml - 2025-12-04T10:35:19.6730753Z =========================== short test summary info ============================ 2025-12-04T10:35:19.6731675Z FAILED [0.2619s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6732961Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6733589Z ^ 2025-12-04T10:35:19.6734076Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6734587Z 2025-12-04T10:35:19.6735190Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6735966Z 2025-12-04T10:35:19.6735977Z 2025-12-04T10:35:19.6736161Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6737135Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6737927Z 2025-12-04T10:35:19.6738166Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6738657Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.6739156Z ================== 1 failed, 187 deselected, 2 rerun in 2.25s ================== 2025-12-04T10:35:19.6739528Z Got exit code 1 2025-12-04T10:35:19.6739739Z Retrying single test... 2025-12-04T10:35:19.6740294Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.xml 2025-12-04T10:35:19.6741037Z ============================= test session starts ============================== 2025-12-04T10:35:19.6741582Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.6742084Z cachedir: .pytest_cache 2025-12-04T10:35:19.6742677Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.6743346Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.6743635Z configfile: pytest.ini 2025-12-04T10:35:19.6744248Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.6745003Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.6745902Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6746718Z Running 1 items in this shard 2025-12-04T10:35:19.6746904Z 2025-12-04T10:35:19.6747966Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.6749949Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6751231Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.6752072Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.6753002Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.6753942Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.6754892Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.6756042Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.6757103Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.6758198Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.6759277Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.6760226Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.6761147Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.6762102Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.6763005Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.6763871Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.6764907Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.6766141Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.6767151Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.6768145Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.6769171Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.6770260Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.6771386Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.6772469Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.6773401Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.6774279Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.6775252Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.6776269Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.6777233Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.6778291Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.6779501Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.6780919Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.6781924Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.6784125Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.6786509Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.6787965Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6789491Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6790888Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6792415Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6793852Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6795392Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6796699Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.6798143Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6799371Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.6800548Z E1204 10:16:00.381000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6801529Z ('RERUN', {'yellow': True}) [1.7020s] [100%] 2025-12-04T10:35:19.6802794Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.6804770Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6806044Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.6806883Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.6808143Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.6809095Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.6810045Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.6811067Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.6812127Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.6813224Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.6814307Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.6815256Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.6816227Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.6817299Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.6818200Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.6819116Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.6820156Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.6821252Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.6822269Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.6823262Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.6824289Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.6825378Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.6826505Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.6827575Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.6828504Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.6829384Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.6830358Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.6831321Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.6832276Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.6833408Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.6834570Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.6835932Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.6836941Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.6839148Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.6841472Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.6843004Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6844537Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6845992Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6847425Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6848862Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6850382Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6851672Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.6853112Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6854331Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.6855512Z E1204 10:16:00.677000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6856495Z ('RERUN', {'yellow': True}) [0.2624s] [100%] 2025-12-04T10:35:19.6857758Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.6859848Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6861124Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.6861971Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.6862887Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.6863820Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.6864767Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.6865839Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.6866899Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.6867990Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.6869181Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.6870134Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.6871060Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.6872013Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.6873230Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.6874209Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.6875406Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.6876644Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.6877755Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.6878914Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.6880052Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.6881217Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.6882481Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.6883672Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.6884723Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.6885827Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.6886883Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.6887949Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.6889064Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.6890239Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.6891460Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.6892914Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.6894039Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.6896367Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.6898971Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.6900587Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6902270Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6903785Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6905319Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6906965Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6908877Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6910252Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.6911834Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6913174Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.6914615Z E1204 10:16:00.941000 73964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6915726Z FAILED [0.2621s] [100%] 2025-12-04T10:35:19.6915910Z 2025-12-04T10:35:19.6916099Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.6916696Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.6917338Z Traceback (most recent call last): 2025-12-04T10:35:19.6918007Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.6918797Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.6919690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.6920550Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.6921406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.6922246Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.6923066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.6923818Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.6924823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.6925771Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.6926685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.6927558Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.6928295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.6928961Z return self._compile_to_module() 2025-12-04T10:35:19.6929758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.6930512Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.6931324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.6932091Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.6932818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.6933689Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.6934628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.6935423Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.6936269Z File "/tmp/tmpgjvjt70m/4k/c4kmuoqlw7haje36g2hzye4whuln2vkn37jxrokygmpdgjaschfj.py", line 62, in 2025-12-04T10:35:19.6937322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.6937975Z kernel.precompile( 2025-12-04T10:35:19.6938773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.6939637Z self._precompile_worker() 2025-12-04T10:35:19.6940411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.6941309Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.6942270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6943188Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6943993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6944760Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6945632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6946561Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6947266Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6948067Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6948829Z ^ 2025-12-04T10:35:19.6949446Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6949985Z 2025-12-04T10:35:19.6950639Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6951438Z 2025-12-04T10:35:19.6951442Z 2025-12-04T10:35:19.6951672Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6952845Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6953703Z 2025-12-04T10:35:19.6953948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6954653Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6955095Z frames [('total', 1)] 2025-12-04T10:35:19.6955413Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6955976Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6956573Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6956994Z graph_break [] 2025-12-04T10:35:19.6957571Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.6958153Z Traceback (most recent call last): 2025-12-04T10:35:19.6958773Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.6959664Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.6960491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.6961383Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.6962212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.6963021Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.6963893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.6964672Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.6965418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.6966468Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.6967409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.6968205Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.6968939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.6969756Z return self._compile_to_module() 2025-12-04T10:35:19.6970489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.6971302Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.6972038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.6972829Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.6973628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.6974473Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.6975343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.6976251Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.6977009Z File "/tmp/tmpfasun0k5/5e/c5eag5bwyt6iyfm2wou25d5fxqzs53tabd65xn2grbj46tetm5rr.py", line 62, in 2025-12-04T10:35:19.6978009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.6978756Z kernel.precompile( 2025-12-04T10:35:19.6979588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.6980442Z self._precompile_worker() 2025-12-04T10:35:19.6981284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.6982150Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.6982989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.6983956Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.6984675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.6985506Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.6986373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.6987238Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.6987878Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.6988796Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.6989506Z ^ 2025-12-04T10:35:19.6990147Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.6990698Z 2025-12-04T10:35:19.6991338Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.6992120Z 2025-12-04T10:35:19.6992124Z 2025-12-04T10:35:19.6992339Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.6993463Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.6994305Z 2025-12-04T10:35:19.6994594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.6995171Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6995714Z frames [('total', 1)] 2025-12-04T10:35:19.6996061Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6996548Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.6997228Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.6997725Z graph_break [] 2025-12-04T10:35:19.6998158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.6998641Z frames [('total', 1)] 2025-12-04T10:35:19.6998989Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.6999465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7000123Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7000593Z graph_break [] 2025-12-04T10:35:19.7000929Z =================================== FAILURES =================================== 2025-12-04T10:35:19.7001621Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T10:35:19.7002190Z Traceback (most recent call last): 2025-12-04T10:35:19.7002858Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7003708Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7004569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7005360Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7006267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7008445Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7009356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7010204Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7010984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7011913Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7012938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7013679Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7014405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7015218Z return self._compile_to_module() 2025-12-04T10:35:19.7015922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7016651Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7017501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7018264Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7019127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7019938Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7020837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7021734Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7022437Z File "/tmp/tmps4_nfeqs/7v/c7vd5dgbf5mkzsxjeurorup7kyyotjy6sbjzorolm46evt2byqvh.py", line 62, in 2025-12-04T10:35:19.7023480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7024221Z kernel.precompile( 2025-12-04T10:35:19.7024951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7025713Z self._precompile_worker() 2025-12-04T10:35:19.7026665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7027578Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7028435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7029325Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7030124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7030921Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7031771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7032628Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7033320Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7034211Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7034942Z ^ 2025-12-04T10:35:19.7035517Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7036158Z 2025-12-04T10:35:19.7036958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7037743Z 2025-12-04T10:35:19.7037748Z 2025-12-04T10:35:19.7037964Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7038995Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.7047346Z 2025-12-04T10:35:19.7047611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7048151Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7048534Z frames [('total', 1)] 2025-12-04T10:35:19.7048779Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7049157Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7049669Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7050045Z graph_break [] 2025-12-04T10:35:19.7050346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7050728Z frames [('total', 1)] 2025-12-04T10:35:19.7050960Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7051320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7051819Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7052208Z graph_break [] 2025-12-04T10:35:19.7052516Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7052900Z frames [('total', 1)] 2025-12-04T10:35:19.7053132Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7053490Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7053985Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7054383Z graph_break [] 2025-12-04T10:35:19.7055064Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.xml - 2025-12-04T10:35:19.7055874Z =========================== short test summary info ============================ 2025-12-04T10:35:19.7056805Z FAILED [0.2621s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7058113Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7058730Z ^ 2025-12-04T10:35:19.7059291Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7059798Z 2025-12-04T10:35:19.7060413Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7061138Z 2025-12-04T10:35:19.7061142Z 2025-12-04T10:35:19.7061332Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7062301Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.7063101Z 2025-12-04T10:35:19.7063333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7063833Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.7064263Z ================== 1 failed, 187 deselected, 2 rerun in 2.26s ================== 2025-12-04T10:35:19.7064633Z Got exit code 1 2025-12-04T10:35:19.7065235Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.7066256Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:19.7067119Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.xml 2025-12-04T10:35:19.7067765Z ============================= test session starts ============================== 2025-12-04T10:35:19.7068317Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.7068814Z cachedir: .pytest_cache 2025-12-04T10:35:19.7069415Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.7070091Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.7070380Z configfile: pytest.ini 2025-12-04T10:35:19.7070991Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.7071747Z collecting ... collected 188 items / 1 deselected / 187 selected 2025-12-04T10:35:19.7072167Z stepcurrent: skipping 1 already run items. 2025-12-04T10:35:19.7072483Z Running 187 items in this shard 2025-12-04T10:35:19.7072658Z 2025-12-04T10:35:19.7073731Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7075708Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7076992Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7077842Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7078280Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7078670Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7079118Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7079694Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7080185Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7080674Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7081155Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7081522Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7081960Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7082359Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7082743Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7083118Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7083663Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7084180Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7084632Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7085055Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7085549Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7086028Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7086555Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7086986Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7087377Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7087755Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7088232Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7088605Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7089083Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7089537Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7090138Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7090812Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7091116Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7092901Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7093363Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7094254Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7094792Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7095624Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7096198Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7096945Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7097597Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7098121Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7098933Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7099287Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7100050Z E1204 10:16:11.111000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7100163Z ('RERUN', {'yellow': True}) [1.7114s] [ 0%] 2025-12-04T10:35:19.7101212Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7102016Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7102374Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7102821Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7103269Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7103654Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7104101Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7104564Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7105052Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7105552Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7106017Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7106387Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7106825Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7107327Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7108249Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7108727Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7109287Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7109727Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7110183Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7110611Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7111097Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7111587Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7112113Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7112537Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7112937Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7113309Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7113800Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7114164Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7114787Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7115240Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7115829Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7116428Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7116727Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7118523Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7118974Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7119973Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7120505Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7121267Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7121849Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7122599Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7123254Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7123776Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7124594Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7124894Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7125663Z E1204 10:16:11.410000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7125780Z ('RERUN', {'yellow': True}) [0.2653s] [ 0%] 2025-12-04T10:35:19.7126921Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7127740Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7128097Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7128482Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7128915Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7129298Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7129761Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7130212Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7130709Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7131196Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7131744Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7132116Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7132554Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7132952Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7133337Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7133712Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7134266Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7134706Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7135171Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7135641Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7136133Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7136617Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7137144Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7137574Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7137963Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7138419Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7138904Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7139324Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7139817Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7140263Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7140861Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7141459Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7141755Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7143536Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7144074Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7144962Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7145522Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7146312Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7146885Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7147646Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7148298Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7148825Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7149632Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7149932Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7150798Z E1204 10:16:11.677000 74145 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7150882Z FAILED [0.2650s] [ 0%] 2025-12-04T10:35:19.7150887Z 2025-12-04T10:35:19.7151009Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.7151278Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7151385Z Traceback (most recent call last): 2025-12-04T10:35:19.7151778Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7151987Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7152413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7152625Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7153066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7153231Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7153662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7153861Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7154323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7154592Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7155035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7155161Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7155571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7155676Z return self._compile_to_module() 2025-12-04T10:35:19.7156086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7156227Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7156667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7156770Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7157195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7157387Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7157884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7157993Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7158424Z File "/tmp/tmph4pfb5gr/di/cdid7yg2wwwtxydkq7m5a4t26b4ojjxldgdwhzxvmxuaety5klet.py", line 62, in 2025-12-04T10:35:19.7158831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7158921Z kernel.precompile( 2025-12-04T10:35:19.7159394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7159492Z self._precompile_worker() 2025-12-04T10:35:19.7159998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7160147Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7160728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7160898Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7161281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7161487Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7161865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7162155Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7162347Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7162787Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7162857Z ^ 2025-12-04T10:35:19.7163251Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7163256Z 2025-12-04T10:35:19.7163864Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7163869Z 2025-12-04T10:35:19.7163873Z 2025-12-04T10:35:19.7164131Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7164817Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7164822Z 2025-12-04T10:35:19.7165043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7165232Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7165320Z frames [('total', 1)] 2025-12-04T10:35:19.7165426Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7165628Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7165810Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7165893Z graph_break [] 2025-12-04T10:35:19.7166166Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7166267Z Traceback (most recent call last): 2025-12-04T10:35:19.7166660Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7166860Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7167275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7167491Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7167932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7168089Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7168523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7168646Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7169104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7169381Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7169821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7169948Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7170435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7170539Z return self._compile_to_module() 2025-12-04T10:35:19.7170949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7171082Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7171525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7171635Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7172054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7172257Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7172752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7172864Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7173293Z File "/tmp/tmp3grrqx61/vv/cvv2v522hnbk3edgdy4pt67uldhm62ysumjfcxgcxspjkoj5fazb.py", line 62, in 2025-12-04T10:35:19.7173685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7173787Z kernel.precompile( 2025-12-04T10:35:19.7174258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7174434Z self._precompile_worker() 2025-12-04T10:35:19.7174939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7175084Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7175640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7175809Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7176188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7176401Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7176770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7177061Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7177250Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7177683Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7177757Z ^ 2025-12-04T10:35:19.7178144Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7178156Z 2025-12-04T10:35:19.7178768Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7178773Z 2025-12-04T10:35:19.7178777Z 2025-12-04T10:35:19.7178956Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7179686Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7179704Z 2025-12-04T10:35:19.7179928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7180105Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7180193Z frames [('total', 1)] 2025-12-04T10:35:19.7180286Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7180576Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7180767Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7180846Z graph_break [] 2025-12-04T10:35:19.7181029Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7181116Z frames [('total', 1)] 2025-12-04T10:35:19.7181209Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7181394Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7181592Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7181669Z graph_break [] 2025-12-04T10:35:19.7181787Z =================================== FAILURES =================================== 2025-12-04T10:35:19.7182053Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7182151Z Traceback (most recent call last): 2025-12-04T10:35:19.7182537Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7182739Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7183152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7183357Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7183791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7184035Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7184465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7184591Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7185048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7185325Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7185822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7185940Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7186342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7186453Z return self._compile_to_module() 2025-12-04T10:35:19.7186861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7186999Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7187434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7187539Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7187963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7188153Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7188657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7188780Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7189219Z File "/tmp/tmp5gmwdyh3/xg/cxg7cacdaqrctjmjcdvbfa4kodev4x2fsazm3rlfps7jo3hvjass.py", line 62, in 2025-12-04T10:35:19.7189612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7189708Z kernel.precompile( 2025-12-04T10:35:19.7190180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7190384Z self._precompile_worker() 2025-12-04T10:35:19.7190897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7191045Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7191556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7191728Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7192104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7192312Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7192686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7192984Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7193175Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7193609Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7193685Z ^ 2025-12-04T10:35:19.7194071Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7194156Z 2025-12-04T10:35:19.7194765Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7194769Z 2025-12-04T10:35:19.7194773Z 2025-12-04T10:35:19.7194957Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7195689Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7195699Z 2025-12-04T10:35:19.7195927Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7196102Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7196193Z frames [('total', 1)] 2025-12-04T10:35:19.7196284Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7196481Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7196675Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7196751Z graph_break [] 2025-12-04T10:35:19.7196935Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7197019Z frames [('total', 1)] 2025-12-04T10:35:19.7197111Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7197298Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7197495Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7197572Z graph_break [] 2025-12-04T10:35:19.7197748Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7197829Z frames [('total', 1)] 2025-12-04T10:35:19.7197921Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7198111Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7198305Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7198392Z graph_break [] 2025-12-04T10:35:19.7198950Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.xml - 2025-12-04T10:35:19.7199095Z =========================== short test summary info ============================ 2025-12-04T10:35:19.7199940Z FAILED [0.2650s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7200456Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7200528Z ^ 2025-12-04T10:35:19.7200915Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7200920Z 2025-12-04T10:35:19.7201524Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7201534Z 2025-12-04T10:35:19.7201538Z 2025-12-04T10:35:19.7201723Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7202398Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7202402Z 2025-12-04T10:35:19.7202637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7202786Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.7202948Z =================== 1 failed, 1 deselected, 2 rerun in 2.28s =================== 2025-12-04T10:35:19.7203028Z Got exit code 1 2025-12-04T10:35:19.7203118Z Retrying single test... 2025-12-04T10:35:19.7203523Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.xml 2025-12-04T10:35:19.7203825Z ============================= test session starts ============================== 2025-12-04T10:35:19.7204114Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.7204208Z cachedir: .pytest_cache 2025-12-04T10:35:19.7204652Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.7204760Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.7204851Z configfile: pytest.ini 2025-12-04T10:35:19.7205320Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.7205541Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.7206158Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7206255Z Running 1 items in this shard 2025-12-04T10:35:19.7206260Z 2025-12-04T10:35:19.7207330Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7208366Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7208734Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7209106Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7209557Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7209942Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7210396Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7210979Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7211475Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7211971Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7212444Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7212819Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7213262Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7213665Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7214050Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7214430Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7214972Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7215574Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7216037Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7216458Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7216951Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7217430Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7217966Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7218399Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7218792Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7219205Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7219691Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7220063Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7220549Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7221008Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7221599Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7222282Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7222583Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7224367Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7224834Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7225773Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7226310Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7227169Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7227747Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7228501Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7229172Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7229686Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7230504Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7230814Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7231577Z E1204 10:16:21.830000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7231693Z ('RERUN', {'yellow': True}) [1.7086s] [100%] 2025-12-04T10:35:19.7232748Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7233561Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7233918Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7234290Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7234813Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7235198Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7235652Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7236108Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7236596Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7237092Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7237571Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7237948Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7238384Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7238863Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7239244Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7239616Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7240171Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7240611Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7241075Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7241497Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7241987Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7242474Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7243007Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7243434Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7243826Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7244195Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7244686Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7245052Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7245617Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7246066Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7246658Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7247257Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7247559Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7249349Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7249799Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7250773Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7251303Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7252068Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7252643Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7253396Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7254056Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7254572Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7255394Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7255700Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7256466Z E1204 10:16:22.129000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7256581Z ('RERUN', {'yellow': True}) [0.2657s] [100%] 2025-12-04T10:35:19.7257640Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7258542Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7258927Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7259390Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7259855Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7260275Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7260760Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7261255Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7261784Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7262311Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7262867Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7263232Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7263674Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7264084Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7264466Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7264841Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7265400Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7265880Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7266346Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7266769Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7267260Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7267742Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7268277Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7268701Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7269090Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7269556Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7270043Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7270412Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7270895Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7271348Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7271950Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7272549Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7272855Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7274632Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7275204Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7276085Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7276625Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7277384Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7277960Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7278723Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7279377Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7279901Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7280711Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7281025Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7281862Z E1204 10:16:22.396000 74326 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7281945Z FAILED [0.2652s] [100%] 2025-12-04T10:35:19.7281950Z 2025-12-04T10:35:19.7282071Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.7282340Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7282452Z Traceback (most recent call last): 2025-12-04T10:35:19.7282838Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7283038Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7283456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7283665Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7284108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7284272Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7284701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7284912Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7285362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7285655Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7286126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7286250Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7286667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7286770Z return self._compile_to_module() 2025-12-04T10:35:19.7287178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7287320Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7287761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7287865Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7288287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7288478Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7288983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7289085Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7289512Z File "/tmp/tmpv46b3rzk/rs/crsizjblpp47j77ikke7sn2zycwm7pk7pz3ig2sccvrsf6mc25l3.py", line 62, in 2025-12-04T10:35:19.7289906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7289993Z kernel.precompile( 2025-12-04T10:35:19.7290473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7290565Z self._precompile_worker() 2025-12-04T10:35:19.7291069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7291218Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7291802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7291971Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7292349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7292550Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7292926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7293211Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7293403Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7293842Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7293909Z ^ 2025-12-04T10:35:19.7294306Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7294311Z 2025-12-04T10:35:19.7294914Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7294919Z 2025-12-04T10:35:19.7294923Z 2025-12-04T10:35:19.7295106Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7295923Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7295928Z 2025-12-04T10:35:19.7296149Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7296336Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7296417Z frames [('total', 1)] 2025-12-04T10:35:19.7296511Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7296717Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7296900Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7296984Z graph_break [] 2025-12-04T10:35:19.7297252Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7297350Z Traceback (most recent call last): 2025-12-04T10:35:19.7297746Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7297945Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7298356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7298565Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7299002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7299211Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7299647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7299768Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7300222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7300497Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7300939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7301058Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7301460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7301644Z return self._compile_to_module() 2025-12-04T10:35:19.7302054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7302190Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7302629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7302739Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7303161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7303353Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7303849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7303959Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7304392Z File "/tmp/tmp3rljc4wb/a6/ca6mehpo2smnii23oqqdmk3z7tb3ehs5wa5gwwusvheoygzbfdlu.py", line 62, in 2025-12-04T10:35:19.7304785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7304871Z kernel.precompile( 2025-12-04T10:35:19.7305339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7305565Z self._precompile_worker() 2025-12-04T10:35:19.7306074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7306223Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7306729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7306899Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7307288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7307490Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7308044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7308335Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7308525Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7308962Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7309031Z ^ 2025-12-04T10:35:19.7309419Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7309423Z 2025-12-04T10:35:19.7310042Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7310047Z 2025-12-04T10:35:19.7310051Z 2025-12-04T10:35:19.7310233Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7310919Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7310928Z 2025-12-04T10:35:19.7311157Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7311344Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7311432Z frames [('total', 1)] 2025-12-04T10:35:19.7311523Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7311729Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7312062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7312145Z graph_break [] 2025-12-04T10:35:19.7312325Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7312407Z frames [('total', 1)] 2025-12-04T10:35:19.7312497Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7312680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7312883Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7312962Z graph_break [] 2025-12-04T10:35:19.7313078Z =================================== FAILURES =================================== 2025-12-04T10:35:19.7313353Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7313463Z Traceback (most recent call last): 2025-12-04T10:35:19.7313847Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7314051Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7314466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7314670Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7315111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7315406Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7315871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7315998Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7316455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7316735Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7317173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7317291Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7317697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7317798Z return self._compile_to_module() 2025-12-04T10:35:19.7318206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7318346Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7318782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7318890Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7319310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7319501Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7320000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7320102Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7320526Z File "/tmp/tmp2n_j7lvu/iw/ciwty7kgg2xlox3iafaem2ishfgcqak44inlcwyvbbmi63ff2ard.py", line 62, in 2025-12-04T10:35:19.7320914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7321008Z kernel.precompile( 2025-12-04T10:35:19.7321480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7321573Z self._precompile_worker() 2025-12-04T10:35:19.7322158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7322311Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7322813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7322991Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7323372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7323572Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7323956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7324238Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7324441Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7324872Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7324940Z ^ 2025-12-04T10:35:19.7325335Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7325421Z 2025-12-04T10:35:19.7326026Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7326031Z 2025-12-04T10:35:19.7326035Z 2025-12-04T10:35:19.7326215Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7326893Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7326902Z 2025-12-04T10:35:19.7327124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7327307Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7327390Z frames [('total', 1)] 2025-12-04T10:35:19.7327491Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7327687Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7327875Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7327957Z graph_break [] 2025-12-04T10:35:19.7328341Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7341080Z frames [('total', 1)] 2025-12-04T10:35:19.7341214Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7341472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7341707Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7341807Z graph_break [] 2025-12-04T10:35:19.7342006Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7342098Z frames [('total', 1)] 2025-12-04T10:35:19.7342198Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7342391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7342590Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7342680Z graph_break [] 2025-12-04T10:35:19.7343257Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.xml - 2025-12-04T10:35:19.7343407Z =========================== short test summary info ============================ 2025-12-04T10:35:19.7344079Z FAILED [0.2652s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7344643Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7344752Z ^ 2025-12-04T10:35:19.7345242Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7345249Z 2025-12-04T10:35:19.7345983Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7345993Z 2025-12-04T10:35:19.7345997Z 2025-12-04T10:35:19.7346190Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7346877Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7346882Z 2025-12-04T10:35:19.7347122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7347280Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.7347481Z ================== 1 failed, 187 deselected, 2 rerun in 2.27s ================== 2025-12-04T10:35:19.7347570Z Got exit code 1 2025-12-04T10:35:19.7347661Z Retrying single test... 2025-12-04T10:35:19.7348067Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.xml 2025-12-04T10:35:19.7348303Z ============================= test session starts ============================== 2025-12-04T10:35:19.7348600Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.7348698Z cachedir: .pytest_cache 2025-12-04T10:35:19.7349148Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.7349255Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.7349359Z configfile: pytest.ini 2025-12-04T10:35:19.7349826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.7350018Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.7350637Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7350745Z Running 1 items in this shard 2025-12-04T10:35:19.7350749Z 2025-12-04T10:35:19.7351822Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7352646Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7353017Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7353394Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7353840Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7354238Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7354692Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7355259Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7355798Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7356293Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7356779Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7357153Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7357600Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7358004Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7358393Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7358776Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7359326Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7359857Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7360319Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7360779Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7361316Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7361804Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7362369Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7362835Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7363239Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7363616Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7364106Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7364485Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7364972Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7365530Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7366301Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7370642Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7370969Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7372757Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7373229Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7374181Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7374823Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7375725Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7376721Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7377602Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7378253Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7378776Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7379737Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7380045Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7380809Z E1204 10:16:32.579000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7380919Z ('RERUN', {'yellow': True}) [1.7074s] [100%] 2025-12-04T10:35:19.7381980Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7382824Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7383190Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7383563Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7384102Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7384490Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7384944Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7385453Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7385942Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7386433Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7386903Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7387272Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7387708Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7388222Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7388607Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7388975Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7410283Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7410753Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7411207Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7411630Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7412125Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7412607Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7413141Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7413571Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7413967Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7414336Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7414827Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7415203Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7415730Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7416415Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7417010Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7417612Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7417925Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7419814Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7420299Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7421379Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7421954Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7422776Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7423397Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7424201Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7424912Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7425520Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7426408Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7426737Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7427557Z E1204 10:16:32.879000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7427679Z ('RERUN', {'yellow': True}) [0.2663s] [100%] 2025-12-04T10:35:19.7428828Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7429787Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7430147Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7430533Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.7430979Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.7431363Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7431823Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7432279Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7432780Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7433270Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7433824Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7434195Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.7434637Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7435043Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7435455Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7435850Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7436399Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.7436843Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7437298Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.7437731Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7438224Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7438706Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.7439252Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.7439677Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7440073Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.7440527Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.7441008Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.7441384Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.7441863Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.7442320Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.7442916Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T10:35:19.7443518Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T10:35:19.7443818Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7445649Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7446197Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7447090Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7447628Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7448385Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7448966Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7449714Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7450373Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7450892Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7451707Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7452016Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7452887Z E1204 10:16:33.146000 74507 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7452979Z FAILED [0.2653s] [100%] 2025-12-04T10:35:19.7452985Z 2025-12-04T10:35:19.7453112Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.7453395Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7453503Z Traceback (most recent call last): 2025-12-04T10:35:19.7453887Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7454105Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7454526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7454742Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7455199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7455360Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7455803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7455924Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7456461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7456746Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7457192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7457329Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7457746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7457847Z return self._compile_to_module() 2025-12-04T10:35:19.7458264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7458400Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7458837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7458959Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7459465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7459666Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7460165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7460274Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7460715Z File "/tmp/tmpb9w0s2xl/xf/cxfgzk5y7ii4s24flmdrloryw2k5hvtbdpigtzky3asn5fgwefle.py", line 62, in 2025-12-04T10:35:19.7461107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7461203Z kernel.precompile( 2025-12-04T10:35:19.7461683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7461779Z self._precompile_worker() 2025-12-04T10:35:19.7462289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7462441Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7463029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7463204Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7463582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7463795Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7464167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7464456Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7464655Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7465095Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7465177Z ^ 2025-12-04T10:35:19.7465628Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7465633Z 2025-12-04T10:35:19.7466236Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7466243Z 2025-12-04T10:35:19.7466247Z 2025-12-04T10:35:19.7466432Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7467201Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7467206Z 2025-12-04T10:35:19.7467436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7467617Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7467709Z frames [('total', 1)] 2025-12-04T10:35:19.7467811Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7468017Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7468207Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7468290Z graph_break [] 2025-12-04T10:35:19.7468560Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7468674Z Traceback (most recent call last): 2025-12-04T10:35:19.7469054Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7469264Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7469688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7469894Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7470347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7470509Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7470940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7471067Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7471519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7471797Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7472239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7472360Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7472769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7473012Z return self._compile_to_module() 2025-12-04T10:35:19.7473423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7473567Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7474007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7474125Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7474544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7474736Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7475237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7475340Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7475753Z File "/tmp/tmpip_kb2yj/tg/ctgwtyu5m2wrux5ehu73k2o5wof472qed5rnrw7nbx2o5ar533mj.py", line 62, in 2025-12-04T10:35:19.7476143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7476232Z kernel.precompile( 2025-12-04T10:35:19.7476715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7476895Z self._precompile_worker() 2025-12-04T10:35:19.7477404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7477566Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7478074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7478254Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7478644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7478854Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7479235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7479523Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7479731Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7480167Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7480240Z ^ 2025-12-04T10:35:19.7480641Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7480645Z 2025-12-04T10:35:19.7481257Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7481262Z 2025-12-04T10:35:19.7481266Z 2025-12-04T10:35:19.7481454Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7482137Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7482146Z 2025-12-04T10:35:19.7482372Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7482564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7482652Z frames [('total', 1)] 2025-12-04T10:35:19.7482754Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7482958Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7483228Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7483317Z graph_break [] 2025-12-04T10:35:19.7483497Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7483584Z frames [('total', 1)] 2025-12-04T10:35:19.7483690Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7483875Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7484076Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7484164Z graph_break [] 2025-12-04T10:35:19.7484288Z =================================== FAILURES =================================== 2025-12-04T10:35:19.7484572Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T10:35:19.7484676Z Traceback (most recent call last): 2025-12-04T10:35:19.7485066Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7485288Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7485750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7485976Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7486418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7486665Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7487117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7487245Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7487698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7487988Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7488434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7488573Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7488983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7489096Z return self._compile_to_module() 2025-12-04T10:35:19.7489521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7489662Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7490106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7490214Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7490640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7490842Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7491340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7491445Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7491983Z File "/tmp/tmptldim1qi/qg/cqgvvref2l6hoiciiaz32zvx44lbwazoxstoh4jwszxsk55wyxef.py", line 62, in 2025-12-04T10:35:19.7492380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7492482Z kernel.precompile( 2025-12-04T10:35:19.7492958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7493055Z self._precompile_worker() 2025-12-04T10:35:19.7493690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7493846Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7494360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7494527Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7494926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7495148Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7495536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7495867Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7496081Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7496520Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7496605Z ^ 2025-12-04T10:35:19.7496998Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7497002Z 2025-12-04T10:35:19.7497698Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7497721Z 2025-12-04T10:35:19.7497725Z 2025-12-04T10:35:19.7497906Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7498587Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7498592Z 2025-12-04T10:35:19.7498827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7499008Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7499148Z frames [('total', 1)] 2025-12-04T10:35:19.7499242Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7499440Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7499639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7499718Z graph_break [] 2025-12-04T10:35:19.7499899Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7499993Z frames [('total', 1)] 2025-12-04T10:35:19.7500087Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7500268Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7500470Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7500554Z graph_break [] 2025-12-04T10:35:19.7500748Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7500832Z frames [('total', 1)] 2025-12-04T10:35:19.7500927Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7501116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7501308Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7501391Z graph_break [] 2025-12-04T10:35:19.7501963Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.xml - 2025-12-04T10:35:19.7502106Z =========================== short test summary info ============================ 2025-12-04T10:35:19.7502774Z FAILED [0.2653s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7503298Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7503370Z ^ 2025-12-04T10:35:19.7503767Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7503772Z 2025-12-04T10:35:19.7504376Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7504385Z 2025-12-04T10:35:19.7504389Z 2025-12-04T10:35:19.7504575Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7505261Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7505266Z 2025-12-04T10:35:19.7505537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7505706Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.7505880Z ================== 1 failed, 187 deselected, 2 rerun in 2.27s ================== 2025-12-04T10:35:19.7505973Z Got exit code 1 2025-12-04T10:35:19.7506444Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.7506878Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:19.7507290Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.xml 2025-12-04T10:35:19.7507426Z ============================= test session starts ============================== 2025-12-04T10:35:19.7507950Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.7508046Z cachedir: .pytest_cache 2025-12-04T10:35:19.7508500Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.7508609Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.7508699Z configfile: pytest.ini 2025-12-04T10:35:19.7509162Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.7509367Z collecting ... collected 188 items / 2 deselected / 186 selected 2025-12-04T10:35:19.7509485Z stepcurrent: skipping 2 already run items. 2025-12-04T10:35:19.7509587Z Running 186 items in this shard 2025-12-04T10:35:19.7509592Z 2025-12-04T10:35:19.7510589Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7511274Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7511659Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7512120Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7512606Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7513083Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7513457Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7514082Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7514525Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7514998Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7515427Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7515823Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7516192Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7516677Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7517052Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7517527Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7517975Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7518540Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7518841Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7520489Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7520955Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7521855Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7522385Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7523147Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7523723Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7524484Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7525135Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7525728Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7526419Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7526720Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7527496Z E1204 10:16:43.556000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7527606Z ('RERUN', {'yellow': True}) [2.0495s] [ 0%] 2025-12-04T10:35:19.7528597Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7529270Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7529648Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7530122Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7530703Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7531196Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7531567Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7532072Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7532534Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7532996Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7533449Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7533854Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7534229Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7534719Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7535084Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7535616Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7536062Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7536527Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7536830Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7538548Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7539077Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7539973Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7540548Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7541313Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7541910Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7542747Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7543405Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7543945Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7544630Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7544950Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7545727Z E1204 10:16:44.049000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7545855Z ('RERUN', {'yellow': True}) [0.4441s] [ 0%] 2025-12-04T10:35:19.7546855Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7547531Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7547923Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7548393Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7548879Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7549355Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7549806Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7550323Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7550772Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7551261Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7551697Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7552105Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7552480Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7552962Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7553342Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7553816Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7554362Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7554824Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7555137Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7556844Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7557313Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7558221Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7558767Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7559532Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7560109Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7560888Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7561555Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7562158Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7562852Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7563164Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7563949Z E1204 10:16:44.491000 74688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7564035Z FAILED [0.4405s] [ 0%] 2025-12-04T10:35:19.7564040Z 2025-12-04T10:35:19.7564169Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.7564465Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7564571Z Traceback (most recent call last): 2025-12-04T10:35:19.7564977Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7565191Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7565610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7565923Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7566363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7566541Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7566985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7567117Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7567588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7567869Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7568316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7568458Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7568866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7568982Z return self._compile_to_module() 2025-12-04T10:35:19.7569393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7569531Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7569998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7570110Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7570553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7570757Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7571263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7571388Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7571838Z File "/tmp/tmp4dsrgto4/e7/ce7frc7nur2mwskyxcvnk6xuunrzu6zbr44yj7npyhf66f6bjjgq.py", line 163, in 2025-12-04T10:35:19.7572240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7572345Z kernel.precompile( 2025-12-04T10:35:19.7572900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7573014Z self._precompile_worker() 2025-12-04T10:35:19.7573532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7573689Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7574225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7574396Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7574796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7575012Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7575397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7575749Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7575946Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7576263Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7576497Z ^ 2025-12-04T10:35:19.7576896Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7576901Z 2025-12-04T10:35:19.7577534Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7577539Z 2025-12-04T10:35:19.7577543Z 2025-12-04T10:35:19.7577734Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7578449Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7578454Z 2025-12-04T10:35:19.7578681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7578864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7578967Z frames [('total', 1)] 2025-12-04T10:35:19.7579113Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7579320Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7579521Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7579604Z graph_break [] 2025-12-04T10:35:19.7579905Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7580010Z Traceback (most recent call last): 2025-12-04T10:35:19.7580400Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7580614Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7581032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7581263Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7581707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7581876Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7582335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7582463Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7583001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7583293Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7583747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7583887Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7584313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7584419Z return self._compile_to_module() 2025-12-04T10:35:19.7584838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7584973Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7585439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7585572Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7586011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7586222Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7586722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7586919Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7587355Z File "/tmp/tmpojdd_ofp/r2/cr2yryra4s7c3n442xzvtykshgmgrlfa3nxm7rbhyhjqkt56eqyd.py", line 163, in 2025-12-04T10:35:19.7587760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7587863Z kernel.precompile( 2025-12-04T10:35:19.7588339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7588436Z self._precompile_worker() 2025-12-04T10:35:19.7588951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7589103Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7589622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7589804Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7590195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7590420Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7590806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7591106Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7591317Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7591623Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7591719Z ^ 2025-12-04T10:35:19.7592119Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7592128Z 2025-12-04T10:35:19.7592740Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7592757Z 2025-12-04T10:35:19.7592761Z 2025-12-04T10:35:19.7592949Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7593726Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7593731Z 2025-12-04T10:35:19.7593978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7594168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7594268Z frames [('total', 1)] 2025-12-04T10:35:19.7594368Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7594581Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7594782Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7594867Z graph_break [] 2025-12-04T10:35:19.7595054Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7595157Z frames [('total', 1)] 2025-12-04T10:35:19.7595256Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7595470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7595709Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7595799Z graph_break [] 2025-12-04T10:35:19.7595935Z =================================== FAILURES =================================== 2025-12-04T10:35:19.7596215Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7596319Z Traceback (most recent call last): 2025-12-04T10:35:19.7596713Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7597000Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7597420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7597649Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7598085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7598266Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7598713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7598837Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7599299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7599575Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7600041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7600164Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7600580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7600699Z return self._compile_to_module() 2025-12-04T10:35:19.7601113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7601249Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7601711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7601825Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7602259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7602462Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7602972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7603089Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7603612Z File "/tmp/tmpfhn5ysyr/tc/ctcm6vvqjjfjontvq47nev7ixgj7avam3r4r7ncj4rlc6mie2y2m.py", line 163, in 2025-12-04T10:35:19.7604030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7604125Z kernel.precompile( 2025-12-04T10:35:19.7604605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7604718Z self._precompile_worker() 2025-12-04T10:35:19.7605230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7605391Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7605949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7606130Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7606538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7606746Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7607125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7607429Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7607712Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7608220Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7608295Z ^ 2025-12-04T10:35:19.7608688Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7608693Z 2025-12-04T10:35:19.7609315Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7609319Z 2025-12-04T10:35:19.7609323Z 2025-12-04T10:35:19.7609512Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7610218Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7610228Z 2025-12-04T10:35:19.7610464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7610647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7610743Z frames [('total', 1)] 2025-12-04T10:35:19.7610846Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7611063Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7611258Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7611343Z graph_break [] 2025-12-04T10:35:19.7611539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7611632Z frames [('total', 1)] 2025-12-04T10:35:19.7611731Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7611934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7612127Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7612216Z graph_break [] 2025-12-04T10:35:19.7612408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7612495Z frames [('total', 1)] 2025-12-04T10:35:19.7612601Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7612784Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7612985Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7613080Z graph_break [] 2025-12-04T10:35:19.7613798Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.xml - 2025-12-04T10:35:19.7613946Z =========================== short test summary info ============================ 2025-12-04T10:35:19.7614630Z FAILED [0.4405s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7614950Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7615042Z ^ 2025-12-04T10:35:19.7615460Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7615466Z 2025-12-04T10:35:19.7616111Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7616123Z 2025-12-04T10:35:19.7616130Z 2025-12-04T10:35:19.7616321Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7617010Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7617014Z 2025-12-04T10:35:19.7617258Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7617519Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.7617701Z =================== 1 failed, 2 deselected, 2 rerun in 2.97s =================== 2025-12-04T10:35:19.7617783Z Got exit code 1 2025-12-04T10:35:19.7617874Z Retrying single test... 2025-12-04T10:35:19.7618299Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.xml 2025-12-04T10:35:19.7618443Z ============================= test session starts ============================== 2025-12-04T10:35:19.7618750Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.7618856Z cachedir: .pytest_cache 2025-12-04T10:35:19.7619373Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.7619492Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.7619590Z configfile: pytest.ini 2025-12-04T10:35:19.7620054Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.7620254Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.7620868Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7620961Z Running 1 items in this shard 2025-12-04T10:35:19.7620973Z 2025-12-04T10:35:19.7621972Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7622662Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7623060Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7623530Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7624011Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7624586Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7624961Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7625475Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7625923Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7626407Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7626838Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7627247Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7627633Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7628122Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7628580Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7629059Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7629503Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7629983Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7630292Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7631947Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7632409Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7633321Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7633854Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7634634Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7635227Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7636033Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7636794Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7637325Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7638017Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7638329Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7639108Z E1204 10:16:54.253000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7639226Z ('RERUN', {'yellow': True}) [2.0392s] [100%] 2025-12-04T10:35:19.7640217Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7640905Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7641372Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7641843Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7642320Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7642816Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7643184Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7643695Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7644153Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7644629Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7645079Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7645474Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7645903Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7646399Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7646783Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7647280Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7647726Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7648272Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7648597Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7650225Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7650700Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7651598Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7652158Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7653000Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7653590Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7654355Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7655007Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7655539Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7696191Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7696662Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7697540Z E1204 10:16:54.726000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7697694Z ('RERUN', {'yellow': True}) [0.4419s] [100%] 2025-12-04T10:35:19.7698948Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7699742Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7700149Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7700643Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7701385Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7701868Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7702231Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7702731Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7703178Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7703638Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7704071Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7704463Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7704836Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7705321Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7705772Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7706253Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7706698Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7707162Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7707461Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7709580Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7710045Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7710943Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7711481Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7712241Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7712943Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7713847Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7714508Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7715023Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7715754Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7716062Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7716825Z E1204 10:16:55.171000 74931 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7716920Z FAILED [0.4429s] [100%] 2025-12-04T10:35:19.7716925Z 2025-12-04T10:35:19.7717048Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.7717327Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7717437Z Traceback (most recent call last): 2025-12-04T10:35:19.7717964Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7718174Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7718587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7718799Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7719249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7719411Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7719845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7719972Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7720424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7720715Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7721154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7721278Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7721700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7721798Z return self._compile_to_module() 2025-12-04T10:35:19.7722214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7722350Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7722790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7722913Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7723332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7723526Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7724027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7724216Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7724678Z File "/tmp/tmplkzjjexn/ug/cugurnnkcfghzbzzd3fafveiff4uhmjrkd4vn7ysnlwpanfbujj6.py", line 163, in 2025-12-04T10:35:19.7725070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7725169Z kernel.precompile( 2025-12-04T10:35:19.7725697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7725798Z self._precompile_worker() 2025-12-04T10:35:19.7726310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7726461Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7726964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7727136Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7727511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7727718Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7728098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7728461Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7728657Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7728956Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7729024Z ^ 2025-12-04T10:35:19.7729420Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7729425Z 2025-12-04T10:35:19.7730039Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7730044Z 2025-12-04T10:35:19.7730048Z 2025-12-04T10:35:19.7730238Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7730924Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7730934Z 2025-12-04T10:35:19.7731162Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7731346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7731435Z frames [('total', 1)] 2025-12-04T10:35:19.7731538Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7731733Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7731921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7732009Z graph_break [] 2025-12-04T10:35:19.7732283Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7732396Z Traceback (most recent call last): 2025-12-04T10:35:19.7732779Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7732990Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7733408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7733613Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7734047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7734296Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7734728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7734859Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7735315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7735640Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7736085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7736211Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7736621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7736721Z return self._compile_to_module() 2025-12-04T10:35:19.7737135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7737275Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7737713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7737818Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7738317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7738513Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7739022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7739174Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7739596Z File "/tmp/tmp22hhy_li/yb/cybumfw22y3yq23jtnnhbvispu7667uveuil3ivdjynahedb4qvv.py", line 163, in 2025-12-04T10:35:19.7739996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7740084Z kernel.precompile( 2025-12-04T10:35:19.7740565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7740665Z self._precompile_worker() 2025-12-04T10:35:19.7741182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7741335Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7741840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7742004Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7742398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7742606Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7742982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7743266Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7743464Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7743776Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7743846Z ^ 2025-12-04T10:35:19.7744236Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7744241Z 2025-12-04T10:35:19.7744931Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7744937Z 2025-12-04T10:35:19.7744941Z 2025-12-04T10:35:19.7745120Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7745859Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7745868Z 2025-12-04T10:35:19.7746095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7746282Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7746372Z frames [('total', 1)] 2025-12-04T10:35:19.7746466Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7746666Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7746849Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7746937Z graph_break [] 2025-12-04T10:35:19.7747120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7747207Z frames [('total', 1)] 2025-12-04T10:35:19.7747310Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7747491Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7747682Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7747850Z graph_break [] 2025-12-04T10:35:19.7747969Z =================================== FAILURES =================================== 2025-12-04T10:35:19.7748241Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7748346Z Traceback (most recent call last): 2025-12-04T10:35:19.7748723Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7748937Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7749358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7749572Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7750016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7750176Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7750621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7750745Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7751195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7751477Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7751922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7752048Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7752467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7752567Z return self._compile_to_module() 2025-12-04T10:35:19.7752979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7753121Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7753556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7753669Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7754089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7754395Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7754897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7755003Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7755430Z File "/tmp/tmp_82ow0f2/q7/cq7ppvjv4btm7rjw7xmfl7sytnqxbsrzcio55evixvrdjwqjjdiy.py", line 163, in 2025-12-04T10:35:19.7755827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7755920Z kernel.precompile( 2025-12-04T10:35:19.7756394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7756488Z self._precompile_worker() 2025-12-04T10:35:19.7757000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7757152Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7757654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7757827Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7758204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7758494Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7758869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7759149Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7759346Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7759655Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7759728Z ^ 2025-12-04T10:35:19.7760123Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7760128Z 2025-12-04T10:35:19.7760737Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7760750Z 2025-12-04T10:35:19.7760753Z 2025-12-04T10:35:19.7760945Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7761632Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7761637Z 2025-12-04T10:35:19.7761873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7762063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7762149Z frames [('total', 1)] 2025-12-04T10:35:19.7762253Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7762450Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7762638Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7762733Z graph_break [] 2025-12-04T10:35:19.7762920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7763009Z frames [('total', 1)] 2025-12-04T10:35:19.7763107Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7763292Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7763496Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7763580Z graph_break [] 2025-12-04T10:35:19.7763757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7763936Z frames [('total', 1)] 2025-12-04T10:35:19.7764030Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7764214Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7764416Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7764493Z graph_break [] 2025-12-04T10:35:19.7765061Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.xml - 2025-12-04T10:35:19.7765207Z =========================== short test summary info ============================ 2025-12-04T10:35:19.7765931Z FAILED [0.4429s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7766236Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7766309Z ^ 2025-12-04T10:35:19.7766714Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7766719Z 2025-12-04T10:35:19.7767325Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7767330Z 2025-12-04T10:35:19.7767334Z 2025-12-04T10:35:19.7767520Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7768289Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7768294Z 2025-12-04T10:35:19.7768519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7768683Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.7768854Z ================== 1 failed, 187 deselected, 2 rerun in 2.96s ================== 2025-12-04T10:35:19.7768936Z Got exit code 1 2025-12-04T10:35:19.7769027Z Retrying single test... 2025-12-04T10:35:19.7769426Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.xml 2025-12-04T10:35:19.7769568Z ============================= test session starts ============================== 2025-12-04T10:35:19.7769864Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.7769961Z cachedir: .pytest_cache 2025-12-04T10:35:19.7770415Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.7770520Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.7770609Z configfile: pytest.ini 2025-12-04T10:35:19.7771075Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.7771264Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.7771879Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7771976Z Running 1 items in this shard 2025-12-04T10:35:19.7771981Z 2025-12-04T10:35:19.7772976Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7773670Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7774124Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7774586Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7775056Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7775544Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7775964Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7776460Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7776908Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7777375Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7777807Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7778199Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7778648Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7779192Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7779559Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7780043Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7780482Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7780941Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7781254Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7782886Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7783346Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7784237Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7784775Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7785581Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7786239Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7786986Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7787636Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7788160Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7788829Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7789138Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7789899Z E1204 10:17:04.998000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7790010Z ('RERUN', {'yellow': True}) [2.0518s] [100%] 2025-12-04T10:35:19.7791078Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7791747Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7792127Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7792583Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7793059Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7793537Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7793895Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7794394Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7794840Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7795314Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7795744Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7796135Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7796507Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7796980Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7797355Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7797932Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7798386Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7798842Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7799147Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7800783Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7801239Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7802137Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7802750Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7803512Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7804095Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7804845Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7805537Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7806068Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7806755Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7807061Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7807959Z E1204 10:17:05.472000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7808073Z ('RERUN', {'yellow': True}) [0.4423s] [100%] 2025-12-04T10:35:19.7809062Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.7809738Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7810231Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T10:35:19.7810732Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7811236Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.7811760Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.7812150Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.7812684Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.7813168Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7813664Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.7814137Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7814651Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.7815024Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.7815518Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.7815920Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.7816412Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.7816852Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.7817309Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.7817617Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7819303Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7819765Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7820650Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7821193Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7822030Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7822611Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7823355Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7824012Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7824533Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7825206Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7825516Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7826274Z E1204 10:17:05.916000 75173 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7826443Z FAILED [0.4427s] [100%] 2025-12-04T10:35:19.7826448Z 2025-12-04T10:35:19.7826567Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.7826842Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7826949Z Traceback (most recent call last): 2025-12-04T10:35:19.7827331Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7827545Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7827964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7828173Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7828613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7828779Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7829213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7829339Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7829794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7830071Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7830515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7830636Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7831047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7831148Z return self._compile_to_module() 2025-12-04T10:35:19.7831571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7831706Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7832144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7832259Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7832759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7832959Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7833460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7833564Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7834011Z File "/tmp/tmpmpmtyyg1/od/codwrqcbdntqen3knoeeafd6qjno45k4qvwyjg6fbt2te2lvy5gk.py", line 163, in 2025-12-04T10:35:19.7834407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7834501Z kernel.precompile( 2025-12-04T10:35:19.7834980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7835076Z self._precompile_worker() 2025-12-04T10:35:19.7835729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7835879Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7836383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7836555Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7837047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7837248Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7837623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7837907Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7838105Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7838413Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7838485Z ^ 2025-12-04T10:35:19.7838881Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7838886Z 2025-12-04T10:35:19.7839493Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7839504Z 2025-12-04T10:35:19.7839508Z 2025-12-04T10:35:19.7839700Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7840389Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7840394Z 2025-12-04T10:35:19.7840629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7840809Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7840895Z frames [('total', 1)] 2025-12-04T10:35:19.7841000Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7841203Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7841387Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7841482Z graph_break [] 2025-12-04T10:35:19.7841756Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7841862Z Traceback (most recent call last): 2025-12-04T10:35:19.7842250Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7842453Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7842953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7843166Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7843600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7843768Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7844200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7844332Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7844784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7845059Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7845522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7845660Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7846091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7846199Z return self._compile_to_module() 2025-12-04T10:35:19.7846609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7846829Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7847265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7847371Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7847792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7847985Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7848493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7848600Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7849040Z File "/tmp/tmprcltxuxc/iw/ciwlj6ht3fp3sbsrqwzcp3tnyqgfl7zs5nrmmycc3hh66kupfm2e.py", line 163, in 2025-12-04T10:35:19.7849436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7849530Z kernel.precompile( 2025-12-04T10:35:19.7849998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7850102Z self._precompile_worker() 2025-12-04T10:35:19.7850609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7850769Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7851272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7851439Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7851826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7852035Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7852416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7852700Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7852895Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7853208Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7853367Z ^ 2025-12-04T10:35:19.7853759Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7853770Z 2025-12-04T10:35:19.7854378Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7854384Z 2025-12-04T10:35:19.7854392Z 2025-12-04T10:35:19.7854576Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7855275Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7855280Z 2025-12-04T10:35:19.7855504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7855694Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7855784Z frames [('total', 1)] 2025-12-04T10:35:19.7855881Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7856088Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7856272Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7856352Z graph_break [] 2025-12-04T10:35:19.7856534Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7856695Z frames [('total', 1)] 2025-12-04T10:35:19.7856793Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7856980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7857173Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7857262Z graph_break [] 2025-12-04T10:35:19.7857382Z =================================== FAILURES =================================== 2025-12-04T10:35:19.7857655Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T10:35:19.7857768Z Traceback (most recent call last): 2025-12-04T10:35:19.7858149Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7858363Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7858778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7858991Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7859489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7859653Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7860083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7860213Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7860673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7860950Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7861388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7861518Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7861930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7862029Z return self._compile_to_module() 2025-12-04T10:35:19.7862449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7862586Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7863115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7863232Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7863650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7863846Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7864358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7864467Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7864929Z File "/tmp/tmpub9pboc1/mh/cmhc5lgbpxu6y6kvpy4pvjjbgwojfgoaowpoqn6xducagndxdhxr.py", line 163, in 2025-12-04T10:35:19.7865347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7865446Z kernel.precompile( 2025-12-04T10:35:19.7865952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7866048Z self._precompile_worker() 2025-12-04T10:35:19.7866563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7866715Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7867384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7867560Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7867942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7868165Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7868549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7868837Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7869047Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7869352Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7869430Z ^ 2025-12-04T10:35:19.7869824Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7869829Z 2025-12-04T10:35:19.7870434Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7870438Z 2025-12-04T10:35:19.7870442Z 2025-12-04T10:35:19.7870634Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7871332Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7871337Z 2025-12-04T10:35:19.7871576Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7871762Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7871854Z frames [('total', 1)] 2025-12-04T10:35:19.7871957Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7872154Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7872340Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7872432Z graph_break [] 2025-12-04T10:35:19.7872608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7872704Z frames [('total', 1)] 2025-12-04T10:35:19.7872800Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7873066Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7873273Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7873359Z graph_break [] 2025-12-04T10:35:19.7873539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7873642Z frames [('total', 1)] 2025-12-04T10:35:19.7873740Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7873933Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7874146Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.7874230Z graph_break [] 2025-12-04T10:35:19.7874817Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.xml - 2025-12-04T10:35:19.7874967Z =========================== short test summary info ============================ 2025-12-04T10:35:19.7875701Z FAILED [0.4427s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7876027Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.7876106Z ^ 2025-12-04T10:35:19.7876515Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7876628Z 2025-12-04T10:35:19.7877241Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7877245Z 2025-12-04T10:35:19.7877249Z 2025-12-04T10:35:19.7877442Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7878150Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7878155Z 2025-12-04T10:35:19.7878386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7878548Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.7878721Z ================== 1 failed, 187 deselected, 2 rerun in 2.97s ================== 2025-12-04T10:35:19.7878807Z Got exit code 1 2025-12-04T10:35:19.7879302Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.7879666Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:19.7880079Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.xml 2025-12-04T10:35:19.7880221Z ============================= test session starts ============================== 2025-12-04T10:35:19.7880522Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.7880627Z cachedir: .pytest_cache 2025-12-04T10:35:19.7881083Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.7881202Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.7881296Z configfile: pytest.ini 2025-12-04T10:35:19.7881774Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.7881984Z collecting ... collected 188 items / 3 deselected / 185 selected 2025-12-04T10:35:19.7882105Z stepcurrent: skipping 3 already run items. 2025-12-04T10:35:19.7882206Z Running 185 items in this shard 2025-12-04T10:35:19.7882210Z 2025-12-04T10:35:19.7883379Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7884290Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.7884675Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7885063Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.7885471Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7885924Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7886393Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7886908Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7887405Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7887966Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7888358Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.7888900Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.7889360Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7889822Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.7890331Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.7890791Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.7891245Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7891677Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7892090Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7892506Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7893156Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.7893613Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7894115Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7894680Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.7895173Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.7895649Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7896098Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.7896490Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.7896976Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.7897377Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.7897868Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.7898335Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.7898940Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.7899548Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.7900155Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.7900461Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7902461Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7902934Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7903853Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7904388Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7905165Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7905801Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7906555Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7907293Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7907998Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7908911Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.7909235Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7910015Z E1204 10:17:15.356000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7910129Z ('RERUN', {'yellow': True}) [1.7100s] [ 0%] 2025-12-04T10:35:19.7911215Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7912221Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.7912581Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7912978Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.7913377Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7913853Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7914314Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7914817Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7915329Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7915804Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7916208Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.7916744Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.7917209Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7917671Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.7918163Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.7918627Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.7919208Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7919645Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7920054Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7920451Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7921107Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.7921549Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7922063Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7922547Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.7923019Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.7923563Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7923971Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.7924366Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.7924856Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.7925243Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.7925791Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.7926265Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.7926878Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.7927364Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.7927983Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.7928297Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7930368Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7930846Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7931740Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7932292Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7933051Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7933644Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7934393Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7935066Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7935659Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7936614Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.7936924Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7937700Z E1204 10:17:15.659000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7937830Z ('RERUN', {'yellow': True}) [0.2696s] [ 0%] 2025-12-04T10:35:19.7938903Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.7939855Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.7940219Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.7940609Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.7940997Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.7941457Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.7941978Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.7942622Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.7943406Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.7943902Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.7944293Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.7944845Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.7945294Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.7945821Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.7946316Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.7946777Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.7947238Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.7947741Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.7948166Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.7948565Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.7949217Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.7949662Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.7950162Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.7950664Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.7951137Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.7951590Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.7952011Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.7952397Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.7952904Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.7953291Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.7953785Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.7954247Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.7954923Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.7955440Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.7956061Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.7956380Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.7958372Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.7958841Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.7959810Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7960350Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7961117Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7961695Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7962456Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7963120Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7963648Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.7964547Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.7964867Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.7965679Z E1204 10:17:15.928000 75415 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7965767Z FAILED [0.2672s] [ 0%] 2025-12-04T10:35:19.7965780Z 2025-12-04T10:35:19.7965899Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.7966178Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.7966283Z Traceback (most recent call last): 2025-12-04T10:35:19.7966771Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7966976Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7967399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7967608Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7968064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7968228Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7968662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7968790Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7969251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7969527Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7969982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7970103Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7970594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7970696Z return self._compile_to_module() 2025-12-04T10:35:19.7971106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7971250Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7971689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7971811Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7972231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7972424Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7972941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7973052Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7973482Z File "/tmp/tmp2tcz4hf_/r2/cr2e2rloto7skiacnbdby5e3xtqlzcpjwobmouy2pw6iv43ft3p7.py", line 62, in 2025-12-04T10:35:19.7973886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7973977Z kernel.precompile( 2025-12-04T10:35:19.7974469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7974565Z self._precompile_worker() 2025-12-04T10:35:19.7975078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7975247Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7975795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7975990Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7976371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7976585Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7976975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7977342Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7977554Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7978077Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.7978158Z ^ 2025-12-04T10:35:19.7978578Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7978582Z 2025-12-04T10:35:19.7979318Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7979324Z 2025-12-04T10:35:19.7979327Z 2025-12-04T10:35:19.7979524Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.7980224Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.7980229Z 2025-12-04T10:35:19.7980455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.7980646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.7980730Z frames [('total', 1)] 2025-12-04T10:35:19.7980927Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.7981132Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.7981322Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.7986198Z graph_break [] 2025-12-04T10:35:19.7986499Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.7986603Z Traceback (most recent call last): 2025-12-04T10:35:19.7987000Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.7987204Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.7987626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.7987835Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.7988266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.7988439Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.7988867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.7988984Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.7989436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.7989716Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.7990156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.7990275Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.7990677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.7990785Z return self._compile_to_module() 2025-12-04T10:35:19.7991192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.7991332Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.7991772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.7991878Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.7993106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.7993309Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.7993807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.7993922Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.7994351Z File "/tmp/tmp18ig6t68/y6/cy6tnzq77225ilakmhbf4p42xssnjrdohzdhakzjxu64qimkmlkw.py", line 62, in 2025-12-04T10:35:19.7994748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.7994836Z kernel.precompile( 2025-12-04T10:35:19.7995306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.7995406Z self._precompile_worker() 2025-12-04T10:35:19.7995917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.7996065Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.7996566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.7996814Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.7997198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.7997400Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.7997769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.7998050Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.7998244Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.7998764Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.7998833Z ^ 2025-12-04T10:35:19.7999218Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.7999231Z 2025-12-04T10:35:19.7999837Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.7999842Z 2025-12-04T10:35:19.7999846Z 2025-12-04T10:35:19.8000025Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8000711Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8000716Z 2025-12-04T10:35:19.8000937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8001117Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8001199Z frames [('total', 1)] 2025-12-04T10:35:19.8001292Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8001499Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8001683Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8001760Z graph_break [] 2025-12-04T10:35:19.8001936Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8002019Z frames [('total', 1)] 2025-12-04T10:35:19.8002112Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8002292Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8002570Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8002654Z graph_break [] 2025-12-04T10:35:19.8002768Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8003037Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.8003143Z Traceback (most recent call last): 2025-12-04T10:35:19.8003522Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8003729Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8004141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8004345Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8004780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8004943Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8005373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8005520Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8005994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8006372Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8006809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8006927Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8007332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8007436Z return self._compile_to_module() 2025-12-04T10:35:19.8008082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8008219Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8008657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8008769Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8009184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8009374Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8009870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8009971Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8010406Z File "/tmp/tmpti48yo5m/ok/cok64jrkydt6lqpflqurrdhle3vr5z4rjecaw6aeine4jc6sejas.py", line 62, in 2025-12-04T10:35:19.8010796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8010884Z kernel.precompile( 2025-12-04T10:35:19.8011355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8011450Z self._precompile_worker() 2025-12-04T10:35:19.8011955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8012101Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8012603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8012768Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8013281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8013487Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8013862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8014140Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8014338Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8014852Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8014923Z ^ 2025-12-04T10:35:19.8015329Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8015335Z 2025-12-04T10:35:19.8015977Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8015982Z 2025-12-04T10:35:19.8015986Z 2025-12-04T10:35:19.8016169Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8016845Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8016958Z 2025-12-04T10:35:19.8017182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8017358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8017439Z frames [('total', 1)] 2025-12-04T10:35:19.8017535Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8017732Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8017920Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8018000Z graph_break [] 2025-12-04T10:35:19.8018173Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8018253Z frames [('total', 1)] 2025-12-04T10:35:19.8018347Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8018526Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8018726Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8018802Z graph_break [] 2025-12-04T10:35:19.8018974Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8019109Z frames [('total', 1)] 2025-12-04T10:35:19.8019202Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8019380Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8019574Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8019659Z graph_break [] 2025-12-04T10:35:19.8020310Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.xml - 2025-12-04T10:35:19.8020507Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8021396Z FAILED [0.2672s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8021934Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8022009Z ^ 2025-12-04T10:35:19.8022412Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8022418Z 2025-12-04T10:35:19.8023131Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8023137Z 2025-12-04T10:35:19.8023140Z 2025-12-04T10:35:19.8023328Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8024021Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8024030Z 2025-12-04T10:35:19.8024261Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8024419Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8024590Z =================== 1 failed, 3 deselected, 2 rerun in 2.28s =================== 2025-12-04T10:35:19.8024674Z Got exit code 1 2025-12-04T10:35:19.8024771Z Retrying single test... 2025-12-04T10:35:19.8025183Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.xml 2025-12-04T10:35:19.8025347Z ============================= test session starts ============================== 2025-12-04T10:35:19.8025679Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8025774Z cachedir: .pytest_cache 2025-12-04T10:35:19.8026230Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8026417Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8026509Z configfile: pytest.ini 2025-12-04T10:35:19.8026979Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8027170Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.8027797Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8027901Z Running 1 items in this shard 2025-12-04T10:35:19.8027906Z 2025-12-04T10:35:19.8028981Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.8029892Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8030262Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8030652Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.8031049Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8031513Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8031976Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8032482Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8032985Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8033471Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8033944Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.8034584Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.8035034Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8035506Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.8036002Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.8036465Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.8036923Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8037342Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8037760Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8038264Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8038959Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.8039419Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8039955Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8040469Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.8040964Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.8041444Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8041877Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.8042288Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.8042808Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.8043216Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.8043736Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.8044228Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.8044872Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.8045387Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.8046154Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.8046469Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8048471Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8048942Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8049844Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8050567Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8051332Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8051926Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8052682Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8053350Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8053878Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8054783Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8055108Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8055927Z E1204 10:17:26.028000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8056045Z ('RERUN', {'yellow': True}) [1.7094s] [100%] 2025-12-04T10:35:19.8057131Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.8058032Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8058480Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8058866Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.8059319Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8059786Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8060253Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8060752Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8061254Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8061734Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8062118Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.8062739Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.8063185Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8063651Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.8064152Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.8064607Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.8065061Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8065510Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8065951Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8066350Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8067001Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.8067445Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8067950Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8068450Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.8068924Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.8069371Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8069867Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.8070261Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.8070755Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.8071148Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.8071640Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.8072105Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.8072713Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.8073205Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.8073810Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.8074197Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8076197Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8076663Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8077560Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8078101Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8078869Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8079452Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8080208Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8080872Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8081397Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8082376Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8082694Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8083461Z E1204 10:17:26.329000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8083579Z ('RERUN', {'yellow': True}) [0.2668s] [100%] 2025-12-04T10:35:19.8084649Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.8085547Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8085915Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8086377Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.8086773Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8087231Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8087704Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8088207Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8088705Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8089186Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8089570Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.8090107Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.8090560Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8091023Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.8091520Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.8091978Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.8092432Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8092855Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8093369Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8093771Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8094415Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.8094862Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8095365Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8095853Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.8096339Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.8096786Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8097202Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.8097594Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.8098200Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.8098592Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.8099130Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.8099596Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.8100200Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.8100695Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.8101300Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.8101609Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8103619Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8104084Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8105062Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8105605Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8106370Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8106961Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8107720Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8108793Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8109319Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8110226Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8110661Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8111432Z E1204 10:17:26.596000 75596 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8111521Z FAILED [0.2661s] [100%] 2025-12-04T10:35:19.8111525Z 2025-12-04T10:35:19.8111655Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.8111942Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.8112048Z Traceback (most recent call last): 2025-12-04T10:35:19.8112439Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8112658Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8113079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8113297Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8113738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8113906Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8114350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8114475Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8114940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8115219Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8115679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8115806Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8116222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8116330Z return self._compile_to_module() 2025-12-04T10:35:19.8116854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8116997Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8117444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8117556Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8117985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8118192Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8118697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8118809Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8119230Z File "/tmp/tmptkyk_avr/u6/cu6mnqj6wdu6zrod277mtym6qctsmbz7osjsp6k62riedppwvahg.py", line 62, in 2025-12-04T10:35:19.8119639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8119733Z kernel.precompile( 2025-12-04T10:35:19.8120214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8120316Z self._precompile_worker() 2025-12-04T10:35:19.8120829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8121067Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8121582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8121752Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8122147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8122357Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8122739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8123031Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8123232Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8123767Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8123843Z ^ 2025-12-04T10:35:19.8124239Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8124245Z 2025-12-04T10:35:19.8124865Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8124870Z 2025-12-04T10:35:19.8124874Z 2025-12-04T10:35:19.8125061Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8125804Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8125814Z 2025-12-04T10:35:19.8126045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8126233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8126323Z frames [('total', 1)] 2025-12-04T10:35:19.8126422Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8126630Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8126823Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8126989Z graph_break [] 2025-12-04T10:35:19.8127273Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.8127379Z Traceback (most recent call last): 2025-12-04T10:35:19.8127766Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8127978Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8128404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8128620Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8129061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8129229Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8129676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8129802Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8130265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8130546Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8131074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8131203Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8131615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8131718Z return self._compile_to_module() 2025-12-04T10:35:19.8132136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8132280Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8132739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8132857Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8133286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8133496Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8134003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8134117Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8134553Z File "/tmp/tmpgc4mg235/w4/cw47ikhnavo7czt2ms3l43nhty4ktuivme76puqsb7f7ng4a6gm2.py", line 62, in 2025-12-04T10:35:19.8134964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8135070Z kernel.precompile( 2025-12-04T10:35:19.8135577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8135702Z self._precompile_worker() 2025-12-04T10:35:19.8136231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8136392Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8136921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8137092Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8137481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8137808Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8138192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8138490Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8138690Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8139284Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8139367Z ^ 2025-12-04T10:35:19.8139766Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8139771Z 2025-12-04T10:35:19.8140395Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8140404Z 2025-12-04T10:35:19.8140408Z 2025-12-04T10:35:19.8140599Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8141289Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8141294Z 2025-12-04T10:35:19.8141533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8141802Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8141901Z frames [('total', 1)] 2025-12-04T10:35:19.8142000Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8142211Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8142415Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8142504Z graph_break [] 2025-12-04T10:35:19.8142697Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8142794Z frames [('total', 1)] 2025-12-04T10:35:19.8142896Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8143090Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8143298Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8143386Z graph_break [] 2025-12-04T10:35:19.8143518Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8143807Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.8143918Z Traceback (most recent call last): 2025-12-04T10:35:19.8144319Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8144532Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8144970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8145188Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8145682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8145861Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8146306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8146436Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8146901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8147182Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8147731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8147865Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8148281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8148389Z return self._compile_to_module() 2025-12-04T10:35:19.8148811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8148966Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8149414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8149527Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8149963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8150166Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8150673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8150789Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8151228Z File "/tmp/tmprauk3sv1/zl/czlrzt72phjozf5sfk4zefvsz32rkupmqh72sthun5kmvddyas56.py", line 62, in 2025-12-04T10:35:19.8151634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8151815Z kernel.precompile( 2025-12-04T10:35:19.8152295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8152401Z self._precompile_worker() 2025-12-04T10:35:19.8152920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8153081Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8153597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8153773Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8154171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8154388Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8154770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8155066Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8155265Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8155807Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8155884Z ^ 2025-12-04T10:35:19.8156281Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8156286Z 2025-12-04T10:35:19.8156906Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8156916Z 2025-12-04T10:35:19.8156920Z 2025-12-04T10:35:19.8157109Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8157806Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8157811Z 2025-12-04T10:35:19.8158042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8158317Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8158408Z frames [('total', 1)] 2025-12-04T10:35:19.8158507Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8158723Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8158920Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8159006Z graph_break [] 2025-12-04T10:35:19.8159201Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8159288Z frames [('total', 1)] 2025-12-04T10:35:19.8159389Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8159586Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8159789Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8159879Z graph_break [] 2025-12-04T10:35:19.8160064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8160161Z frames [('total', 1)] 2025-12-04T10:35:19.8160263Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8160451Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8160656Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8160749Z graph_break [] 2025-12-04T10:35:19.8161316Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.xml - 2025-12-04T10:35:19.8161561Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8162233Z FAILED [0.2661s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8162765Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8162854Z ^ 2025-12-04T10:35:19.8163255Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8163260Z 2025-12-04T10:35:19.8163886Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8163896Z 2025-12-04T10:35:19.8163900Z 2025-12-04T10:35:19.8164086Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8164779Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8164791Z 2025-12-04T10:35:19.8165026Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8165188Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8165384Z ================== 1 failed, 187 deselected, 2 rerun in 2.28s ================== 2025-12-04T10:35:19.8165489Z Got exit code 1 2025-12-04T10:35:19.8165599Z Retrying single test... 2025-12-04T10:35:19.8166022Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.xml 2025-12-04T10:35:19.8166167Z ============================= test session starts ============================== 2025-12-04T10:35:19.8166483Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8166582Z cachedir: .pytest_cache 2025-12-04T10:35:19.8167036Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8167160Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8167255Z configfile: pytest.ini 2025-12-04T10:35:19.8167812Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8168010Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.8168632Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8168739Z Running 1 items in this shard 2025-12-04T10:35:19.8168751Z 2025-12-04T10:35:19.8169830Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.8170748Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8171119Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8171513Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.8171916Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8172488Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8172961Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8173463Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8173970Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8174449Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8174837Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.8175392Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.8175845Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8176315Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.8176820Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.8177277Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.8177740Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8178164Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8178582Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8178979Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8179765Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.8180213Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8180716Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8181218Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.8181694Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.8182146Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8182582Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.8182976Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.8183477Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.8183947Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.8184444Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.8184918Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.8185532Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.8186028Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.8186634Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.8187032Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8189041Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8189509Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8190414Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8190953Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8191811Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8192398Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8193161Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8193828Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8194364Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8195297Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8195652Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8196501Z E1204 10:17:36.678000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8196621Z ('RERUN', {'yellow': True}) [1.7158s] [100%] 2025-12-04T10:35:19.8197704Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.8198608Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8198981Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8199374Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.8199768Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8200233Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8200706Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8201215Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8201721Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8202209Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8202594Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.8203139Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.8203688Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8204156Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.8204660Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.8205132Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.8205635Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8206065Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8206484Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8206892Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8207544Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.8208207Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8208724Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8209217Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.8209710Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.8210165Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8210588Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.8210994Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.8211490Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.8211894Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.8212393Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.8212870Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.8213476Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.8213973Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.8214584Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.8214894Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8217076Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8217571Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8218530Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8219135Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8219915Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8220638Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8221396Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8222071Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8222600Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8223514Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8223831Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8224606Z E1204 10:17:36.984000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8224725Z ('RERUN', {'yellow': True}) [0.2720s] [100%] 2025-12-04T10:35:19.8225851Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T10:35:19.8226755Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8227129Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8227517Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.8228111Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8228577Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8229043Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8229547Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8230059Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8230533Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8230922Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.8231463Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.8231918Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8232387Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.8232966Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.8233433Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.8233900Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8234326Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8234751Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8235154Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8235871Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.8236315Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8236830Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8237328Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.8237809Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.8238265Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8238688Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.8239083Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.8239579Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.8240052Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.8240553Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.8241020Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.8241636Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T10:35:19.8242127Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.8242740Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T10:35:19.8243056Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8245057Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8245608Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8246504Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8247047Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8247819Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8248410Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8249177Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8249843Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8250372Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8251281Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8251604Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8252450Z E1204 10:17:37.255000 75777 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8252550Z FAILED [0.2693s] [100%] 2025-12-04T10:35:19.8252555Z 2025-12-04T10:35:19.8252683Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.8252979Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.8253091Z Traceback (most recent call last): 2025-12-04T10:35:19.8253481Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8253712Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8254136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8254358Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8254804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8254974Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8255427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8255642Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8256105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8256387Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8256843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8256985Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8257405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8257513Z return self._compile_to_module() 2025-12-04T10:35:19.8257940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8258087Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8258539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8258655Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8259133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8259349Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8259859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8259968Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8260426Z File "/tmp/tmp6z14z9by/kv/ckvppj2tnkky6jfblaitlix7vhwddddcua3koq3d4tlnx6m6elm7.py", line 62, in 2025-12-04T10:35:19.8260828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8260940Z kernel.precompile( 2025-12-04T10:35:19.8261421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8261523Z self._precompile_worker() 2025-12-04T10:35:19.8262047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8262200Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8262827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8263006Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8263396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8263616Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8264002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8264293Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8264507Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8265038Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8265124Z ^ 2025-12-04T10:35:19.8265534Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8265539Z 2025-12-04T10:35:19.8266157Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8266162Z 2025-12-04T10:35:19.8266241Z 2025-12-04T10:35:19.8266435Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8267130Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8267134Z 2025-12-04T10:35:19.8267375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8267567Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8267665Z frames [('total', 1)] 2025-12-04T10:35:19.8267772Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8267980Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8268183Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8268269Z graph_break [] 2025-12-04T10:35:19.8268550Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.8268669Z Traceback (most recent call last): 2025-12-04T10:35:19.8269066Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8269280Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8269706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8269921Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8270374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8270542Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8270983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8271114Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8271582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8271875Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8272331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8272460Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8272965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8273071Z return self._compile_to_module() 2025-12-04T10:35:19.8273491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8273637Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8274090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8274214Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8274643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8274848Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8275364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8275477Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8275929Z File "/tmp/tmpygafdbo0/ct/cctq473uotyp5vzfipgmvvuhwlay5yshfdsmyv6eboiy62zhnwh6.py", line 62, in 2025-12-04T10:35:19.8276329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8276433Z kernel.precompile( 2025-12-04T10:35:19.8277007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8277108Z self._precompile_worker() 2025-12-04T10:35:19.8277628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8277788Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8278310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8278486Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8278875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8279092Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8279477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8279771Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8284114Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8284668Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8284744Z ^ 2025-12-04T10:35:19.8285156Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8285162Z 2025-12-04T10:35:19.8285826Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8285831Z 2025-12-04T10:35:19.8285835Z 2025-12-04T10:35:19.8286028Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8286727Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8286732Z 2025-12-04T10:35:19.8286966Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8287159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8287250Z frames [('total', 1)] 2025-12-04T10:35:19.8287354Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8287670Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8287867Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8287958Z graph_break [] 2025-12-04T10:35:19.8288144Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8288233Z frames [('total', 1)] 2025-12-04T10:35:19.8288346Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8288537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8288739Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8288827Z graph_break [] 2025-12-04T10:35:19.8288958Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8289241Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T10:35:19.8289348Z Traceback (most recent call last): 2025-12-04T10:35:19.8289743Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8289960Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8290383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8290601Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8291131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8291299Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8291741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8291868Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8292342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8292626Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8293080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8293213Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8293635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8293740Z return self._compile_to_module() 2025-12-04T10:35:19.8294158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8294305Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8294755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8294872Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8295301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8295510Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8296060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8296175Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8296598Z File "/tmp/tmpj_4eqc11/5t/c5ty6ahpawh6bvwevrunlvix5gfgqhxerb56clsai43plrubxokf.py", line 62, in 2025-12-04T10:35:19.8296997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8297099Z kernel.precompile( 2025-12-04T10:35:19.8297658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8297759Z self._precompile_worker() 2025-12-04T10:35:19.8298276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8298430Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8298950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8299207Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8299594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8299810Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8300190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8300483Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8300686Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8301214Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8301294Z ^ 2025-12-04T10:35:19.8301773Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8301778Z 2025-12-04T10:35:19.8302391Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8302399Z 2025-12-04T10:35:19.8302403Z 2025-12-04T10:35:19.8302593Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8303289Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8303294Z 2025-12-04T10:35:19.8303527Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8303714Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8303806Z frames [('total', 1)] 2025-12-04T10:35:19.8303912Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8304119Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8304312Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8304395Z graph_break [] 2025-12-04T10:35:19.8304579Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8304675Z frames [('total', 1)] 2025-12-04T10:35:19.8304774Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8304965Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8305170Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8305254Z graph_break [] 2025-12-04T10:35:19.8305443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8305530Z frames [('total', 1)] 2025-12-04T10:35:19.8305627Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8305815Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8306023Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8306105Z graph_break [] 2025-12-04T10:35:19.8306681Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.xml - 2025-12-04T10:35:19.8306830Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8307620Z FAILED [0.2693s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8308309Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.8308387Z ^ 2025-12-04T10:35:19.8308789Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8308801Z 2025-12-04T10:35:19.8309414Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8309419Z 2025-12-04T10:35:19.8309422Z 2025-12-04T10:35:19.8309616Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8310305Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8310310Z 2025-12-04T10:35:19.8310545Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8310701Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8310875Z ================== 1 failed, 187 deselected, 2 rerun in 2.29s ================== 2025-12-04T10:35:19.8311090Z Got exit code 1 2025-12-04T10:35:19.8311572Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.8311931Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:19.8312346Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.xml 2025-12-04T10:35:19.8312487Z ============================= test session starts ============================== 2025-12-04T10:35:19.8312799Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8312893Z cachedir: .pytest_cache 2025-12-04T10:35:19.8313347Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8313458Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8313551Z configfile: pytest.ini 2025-12-04T10:35:19.8314029Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8314224Z collecting ... collected 188 items / 4 deselected / 184 selected 2025-12-04T10:35:19.8314346Z stepcurrent: skipping 4 already run items. 2025-12-04T10:35:19.8314448Z Running 184 items in this shard 2025-12-04T10:35:19.8314452Z 2025-12-04T10:35:19.8315499Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8316214Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8316614Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8317088Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8317572Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8318056Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8318539Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8319048Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8319497Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8319978Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8320412Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8320811Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8321191Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8321675Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8322050Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8322612Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8323066Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8323530Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8323840Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8325514Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8326008Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8326909Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8327452Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8328218Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8328806Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8329562Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8330307Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8330831Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8331516Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8331831Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8332600Z E1204 10:17:47.755000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8332714Z ('RERUN', {'yellow': True}) [2.0916s] [ 0%] 2025-12-04T10:35:19.8333725Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8334404Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8334878Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8335345Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8335823Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8336313Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8336680Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8337184Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8337634Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8338114Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8338550Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8338945Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8339376Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8339863Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8340235Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8340723Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8341169Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8341636Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8342026Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8343664Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8344132Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8345030Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8345611Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8346385Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8347073Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8347826Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8348497Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8349020Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8349699Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8350024Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8350788Z E1204 10:17:48.260000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8350903Z ('RERUN', {'yellow': True}) [0.4729s] [ 0%] 2025-12-04T10:35:19.8351910Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8352592Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8352991Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8353456Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8353940Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8354504Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8354875Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8355383Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8355834Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8356306Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8356740Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8357148Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8357522Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8358004Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8358384Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8358944Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8359397Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8359861Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8360172Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8361806Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8362272Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8363175Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8363715Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8364480Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8365068Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8365825Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8366564Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8367093Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8367773Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8368088Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8368855Z E1204 10:17:48.734000 75958 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8368944Z FAILED [0.4722s] [ 0%] 2025-12-04T10:35:19.8368952Z 2025-12-04T10:35:19.8369081Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.8369368Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8369474Z Traceback (most recent call last): 2025-12-04T10:35:19.8369866Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8370165Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8370583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8370802Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8371243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8371418Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8371858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8371985Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8372447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8372734Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8373183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8373311Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8373725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8373833Z return self._compile_to_module() 2025-12-04T10:35:19.8374252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8374397Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8374844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8374954Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8375388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8375589Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8376093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8376206Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8376717Z File "/tmp/tmpr8r26_lm/6y/c6ykhis2ft6fc7sjdns64at5bavcwegprgynyfqkhmobcqcs532z.py", line 168, in 2025-12-04T10:35:19.8377125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8377220Z kernel.precompile( 2025-12-04T10:35:19.8377697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8377800Z self._precompile_worker() 2025-12-04T10:35:19.8378318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8378474Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8378987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8379208Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8379601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8379813Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8380194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8380488Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8380769Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8381079Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8381155Z ^ 2025-12-04T10:35:19.8381551Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8381556Z 2025-12-04T10:35:19.8382182Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8382187Z 2025-12-04T10:35:19.8382190Z 2025-12-04T10:35:19.8382378Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8383080Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8383090Z 2025-12-04T10:35:19.8383320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8383510Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8383598Z frames [('total', 1)] 2025-12-04T10:35:19.8383696Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8383903Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8384094Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8384178Z graph_break [] 2025-12-04T10:35:19.8384472Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8384578Z Traceback (most recent call last): 2025-12-04T10:35:19.8384965Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8385177Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8385645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8385867Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8386309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8386475Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8386999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8387126Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8387588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8387867Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8388314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8388450Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8388862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8388967Z return self._compile_to_module() 2025-12-04T10:35:19.8389391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8389536Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8389982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8390094Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8390518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8390821Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8391327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8391439Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8391884Z File "/tmp/tmpb7zf3v9d/3c/c3cdzfioke7fv46octmqsd53fsmncxohaogbdcg6zem3d4r5omkj.py", line 168, in 2025-12-04T10:35:19.8392287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8392384Z kernel.precompile( 2025-12-04T10:35:19.8392863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8392963Z self._precompile_worker() 2025-12-04T10:35:19.8393478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8393639Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8394155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8394327Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8394712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8394929Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8395309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8395605Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8395801Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8396109Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8396191Z ^ 2025-12-04T10:35:19.8396586Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8396591Z 2025-12-04T10:35:19.8397212Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8397217Z 2025-12-04T10:35:19.8397220Z 2025-12-04T10:35:19.8397490Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8398188Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8398197Z 2025-12-04T10:35:19.8398428Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8398618Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8398709Z frames [('total', 1)] 2025-12-04T10:35:19.8398807Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8399012Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8399206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8399290Z graph_break [] 2025-12-04T10:35:19.8399473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8399564Z frames [('total', 1)] 2025-12-04T10:35:19.8399665Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8399857Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8400055Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8400138Z graph_break [] 2025-12-04T10:35:19.8400268Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8400635Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8400741Z Traceback (most recent call last): 2025-12-04T10:35:19.8401133Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8401341Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8401763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8401981Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8402423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8402591Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8403031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8403167Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8403630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8403907Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8404356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8404486Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8404898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8405004Z return self._compile_to_module() 2025-12-04T10:35:19.8405445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8405612Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8406062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8406174Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8406601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8406799Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8407474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8407588Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8408201Z File "/tmp/tmp0cdw70e8/fz/cfz4ycz2ldx27axtnofwsit4udseotqt5wvd7v6n6qkkfar4rkj3.py", line 168, in 2025-12-04T10:35:19.8408603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8408703Z kernel.precompile( 2025-12-04T10:35:19.8409180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8409282Z self._precompile_worker() 2025-12-04T10:35:19.8409795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8409950Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8410468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8410638Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8411024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8411234Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8411733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8412025Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8412222Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8412531Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8412605Z ^ 2025-12-04T10:35:19.8413004Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8413008Z 2025-12-04T10:35:19.8413624Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8413629Z 2025-12-04T10:35:19.8413632Z 2025-12-04T10:35:19.8413818Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8414525Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8414530Z 2025-12-04T10:35:19.8414759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8414947Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8415036Z frames [('total', 1)] 2025-12-04T10:35:19.8415138Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8415345Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8415570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8415667Z graph_break [] 2025-12-04T10:35:19.8415868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8415964Z frames [('total', 1)] 2025-12-04T10:35:19.8416069Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8416261Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8416460Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8416549Z graph_break [] 2025-12-04T10:35:19.8416731Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8416820Z frames [('total', 1)] 2025-12-04T10:35:19.8416919Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8417227Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8417429Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8417522Z graph_break [] 2025-12-04T10:35:19.8418089Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.xml - 2025-12-04T10:35:19.8418239Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8418931Z FAILED [0.4722s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8419309Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8419393Z ^ 2025-12-04T10:35:19.8419790Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8419794Z 2025-12-04T10:35:19.8420416Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8420420Z 2025-12-04T10:35:19.8420424Z 2025-12-04T10:35:19.8420613Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8421308Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8421401Z 2025-12-04T10:35:19.8421633Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8421791Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8421967Z =================== 1 failed, 4 deselected, 2 rerun in 3.07s =================== 2025-12-04T10:35:19.8422051Z Got exit code 1 2025-12-04T10:35:19.8422143Z Retrying single test... 2025-12-04T10:35:19.8422569Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.xml 2025-12-04T10:35:19.8422710Z ============================= test session starts ============================== 2025-12-04T10:35:19.8423015Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8423109Z cachedir: .pytest_cache 2025-12-04T10:35:19.8423571Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8423682Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8423777Z configfile: pytest.ini 2025-12-04T10:35:19.8424244Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8424440Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.8425072Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8425178Z Running 1 items in this shard 2025-12-04T10:35:19.8425183Z 2025-12-04T10:35:19.8426199Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8426891Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8427290Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8427869Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8428357Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8428841Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8429226Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8429730Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8430178Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8430660Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8431096Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8431495Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8431871Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8432437Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8432818Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8433301Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8433760Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8434226Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8434532Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8436236Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8436706Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8437606Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8438146Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8438919Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8439502Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8440338Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8441001Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8441531Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8442217Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8442526Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8443296Z E1204 10:17:58.460000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8443410Z ('RERUN', {'yellow': True}) [2.0976s] [100%] 2025-12-04T10:35:19.8444417Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8445176Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8445623Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8446094Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8446572Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8447058Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8447443Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8447946Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8448396Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8448870Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8449309Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8449707Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8450081Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8450571Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8450944Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8451507Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8451958Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8452432Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8452740Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8454387Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8454853Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8455797Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8456416Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8457178Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8457768Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8458523Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8459230Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8459760Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8460446Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8460765Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8461530Z E1204 10:17:58.966000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8461646Z ('RERUN', {'yellow': True}) [0.4733s] [100%] 2025-12-04T10:35:19.8462651Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8463346Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8463746Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8464294Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8464777Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8465261Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8465685Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8466188Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8466634Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8467114Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8467549Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8467957Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8468409Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8468897Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8469275Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8469759Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8470221Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8470685Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8470990Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8472645Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8473108Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8474012Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8474553Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8475321Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8476061Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8476826Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8477485Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8478014Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8478697Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8479009Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8479781Z E1204 10:17:59.437000 76203 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8479873Z FAILED [0.4697s] [100%] 2025-12-04T10:35:19.8479878Z 2025-12-04T10:35:19.8480088Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.8480377Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8480484Z Traceback (most recent call last): 2025-12-04T10:35:19.8480883Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8481096Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8481523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8481743Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8482185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8482355Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8482808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8482932Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8483395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8483675Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8484137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8484265Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8484678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8484785Z return self._compile_to_module() 2025-12-04T10:35:19.8485206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8485349Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8485800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8485916Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8486352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8486637Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8487145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8487257Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8487712Z File "/tmp/tmpvrzblr75/si/csivzh63avudnamvcpszbph2ousqhcey6f465tkdhy7opfovkr7p.py", line 168, in 2025-12-04T10:35:19.8488123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8488220Z kernel.precompile( 2025-12-04T10:35:19.8488703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8488809Z self._precompile_worker() 2025-12-04T10:35:19.8489323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8489484Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8490004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8490175Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8490563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8490854Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8491237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8491529Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8491729Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8492042Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8492119Z ^ 2025-12-04T10:35:19.8492514Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8492519Z 2025-12-04T10:35:19.8493137Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8493150Z 2025-12-04T10:35:19.8493154Z 2025-12-04T10:35:19.8493347Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8494050Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8494055Z 2025-12-04T10:35:19.8494286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8494477Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8494570Z frames [('total', 1)] 2025-12-04T10:35:19.8494671Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8494888Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8495078Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8495161Z graph_break [] 2025-12-04T10:35:19.8495481Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8495619Z Traceback (most recent call last): 2025-12-04T10:35:19.8496007Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8496220Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8496639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8496943Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8497388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8497554Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8498002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8498132Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8498597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8498873Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8499364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8499496Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8499915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8500024Z return self._compile_to_module() 2025-12-04T10:35:19.8500444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8500588Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8501118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8501229Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8501656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8501858Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8502368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8502481Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8502934Z File "/tmp/tmpi5dm35l0/p7/cp7v7zg5ov627f6dhxzehuuaoxtmo3ncyq2l7b25xaaenz3dsex2.py", line 168, in 2025-12-04T10:35:19.8503334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8503440Z kernel.precompile( 2025-12-04T10:35:19.8503920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8504021Z self._precompile_worker() 2025-12-04T10:35:19.8504545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8504698Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8505220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8505416Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8505830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8506043Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8506427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8506717Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8506915Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8507225Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8507304Z ^ 2025-12-04T10:35:19.8508033Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8508040Z 2025-12-04T10:35:19.8508667Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8508672Z 2025-12-04T10:35:19.8508676Z 2025-12-04T10:35:19.8508864Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8509572Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8509577Z 2025-12-04T10:35:19.8509812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8509999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8510094Z frames [('total', 1)] 2025-12-04T10:35:19.8510192Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8510403Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8510600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8510684Z graph_break [] 2025-12-04T10:35:19.8510870Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8510963Z frames [('total', 1)] 2025-12-04T10:35:19.8511063Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8511397Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8511598Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8511681Z graph_break [] 2025-12-04T10:35:19.8511821Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8512109Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8512214Z Traceback (most recent call last): 2025-12-04T10:35:19.8512612Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8512822Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8513247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8513461Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8513909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8514082Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8514521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8514650Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8515122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8515400Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8515849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8515976Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8516393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8516502Z return self._compile_to_module() 2025-12-04T10:35:19.8516918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8517063Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8517508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8517721Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8518151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8518350Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8518857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8518981Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8519426Z File "/tmp/tmpjq4t5iry/52/c523zshaeih26kv6egrdta67mhvi4uarmtxupydf5nsgcc5rtvf5.py", line 168, in 2025-12-04T10:35:19.8519836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8519930Z kernel.precompile( 2025-12-04T10:35:19.8520414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8520522Z self._precompile_worker() 2025-12-04T10:35:19.8521037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8521192Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8521706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8521960Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8522351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8522562Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8522943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8523249Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8523447Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8523760Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8523836Z ^ 2025-12-04T10:35:19.8524275Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8524288Z 2025-12-04T10:35:19.8525088Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8525095Z 2025-12-04T10:35:19.8525100Z 2025-12-04T10:35:19.8525354Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8526205Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8526210Z 2025-12-04T10:35:19.8526447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8526641Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8526735Z frames [('total', 1)] 2025-12-04T10:35:19.8526834Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8527045Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8527235Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8527324Z graph_break [] 2025-12-04T10:35:19.8527513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8527601Z frames [('total', 1)] 2025-12-04T10:35:19.8527700Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8527893Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8528199Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8528287Z graph_break [] 2025-12-04T10:35:19.8528470Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8528557Z frames [('total', 1)] 2025-12-04T10:35:19.8528656Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8528844Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8529055Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8529143Z graph_break [] 2025-12-04T10:35:19.8529709Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.xml - 2025-12-04T10:35:19.8529861Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8530558Z FAILED [0.4697s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8530870Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8530949Z ^ 2025-12-04T10:35:19.8531344Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8531348Z 2025-12-04T10:35:19.8531964Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8532047Z 2025-12-04T10:35:19.8532050Z 2025-12-04T10:35:19.8532239Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8532933Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8532942Z 2025-12-04T10:35:19.8533179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8533341Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8533519Z ================== 1 failed, 187 deselected, 2 rerun in 3.08s ================== 2025-12-04T10:35:19.8533604Z Got exit code 1 2025-12-04T10:35:19.8533697Z Retrying single test... 2025-12-04T10:35:19.8534110Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.xml 2025-12-04T10:35:19.8534259Z ============================= test session starts ============================== 2025-12-04T10:35:19.8534571Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8534668Z cachedir: .pytest_cache 2025-12-04T10:35:19.8535124Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8535243Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8535336Z configfile: pytest.ini 2025-12-04T10:35:19.8535851Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8536049Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.8536675Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8536783Z Running 1 items in this shard 2025-12-04T10:35:19.8536788Z 2025-12-04T10:35:19.8537801Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8538578Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8538983Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8539506Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8540000Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8540484Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8540856Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8541365Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8541813Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8542285Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8542798Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8543200Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8543577Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8544067Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8544441Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8544923Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8545399Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8545899Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8546207Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8547857Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8548321Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8549235Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8549776Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8550634Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8551219Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8551978Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8552646Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8553174Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8553864Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8554177Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8554946Z E1204 10:18:09.169000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8555167Z ('RERUN', {'yellow': True}) [2.0953s] [100%] 2025-12-04T10:35:19.8559762Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8560481Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8560881Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8561351Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8561841Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8562327Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8562695Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8563205Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8563657Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8564127Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8564568Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8564968Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8565347Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8566002Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8566374Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8566861Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8567315Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8567779Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8568097Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8569744Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8570293Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8571190Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8571730Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8572496Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8573082Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8573843Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8574507Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8575038Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8575724Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8576038Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8576867Z E1204 10:18:09.674000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8577029Z ('RERUN', {'yellow': True}) [0.4735s] [100%] 2025-12-04T10:35:19.8578382Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T10:35:19.8579163Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8579565Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T10:35:19.8580029Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8580516Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:19.8581000Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:19.8581377Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:19.8581880Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T10:35:19.8582327Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8582801Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T10:35:19.8583397Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:19.8583799Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T10:35:19.8584174Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T10:35:19.8584662Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T10:35:19.8585036Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T10:35:19.8585568Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T10:35:19.8586024Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T10:35:19.8586487Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T10:35:19.8586793Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8588442Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8588907Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8589805Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8590421Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8591187Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8591868Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8592638Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8593299Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8593827Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8594510Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8594818Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8595676Z E1204 10:18:10.148000 76446 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8595768Z FAILED [0.4719s] [100%] 2025-12-04T10:35:19.8595773Z 2025-12-04T10:35:19.8595903Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.8596192Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8596307Z Traceback (most recent call last): 2025-12-04T10:35:19.8596701Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8596914Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8597335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8597564Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8598008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8598177Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8598617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8598748Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8599213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8599492Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8599945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8600077Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8600491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8600598Z return self._compile_to_module() 2025-12-04T10:35:19.8601015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8601156Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8601715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8601831Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8602262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8602462Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8602976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8603089Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8603539Z File "/tmp/tmpaa2nfmhq/kh/ckhxuv4xjnpquidpif7ji5k5ymvqhoaqeczyem62y5j6oxxc6j5y.py", line 168, in 2025-12-04T10:35:19.8603943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8604037Z kernel.precompile( 2025-12-04T10:35:19.8604523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8604626Z self._precompile_worker() 2025-12-04T10:35:19.8605142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8605300Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8605947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8606119Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8606513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8606724Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8607110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8607403Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8607601Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8608113Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8608198Z ^ 2025-12-04T10:35:19.8608597Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8608602Z 2025-12-04T10:35:19.8609219Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8609224Z 2025-12-04T10:35:19.8609228Z 2025-12-04T10:35:19.8609414Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8610125Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8610131Z 2025-12-04T10:35:19.8610362Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8610551Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8610644Z frames [('total', 1)] 2025-12-04T10:35:19.8610745Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8610954Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8611150Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8611236Z graph_break [] 2025-12-04T10:35:19.8611528Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8611634Z Traceback (most recent call last): 2025-12-04T10:35:19.8612155Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8612369Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8612788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8613010Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8613458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8613625Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8614068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8614193Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8614658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8614945Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8615422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8615568Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8615991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8616204Z return self._compile_to_module() 2025-12-04T10:35:19.8616623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8616765Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8617215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8617332Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8617760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8617963Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8618471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8618590Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8619082Z File "/tmp/tmp9xykmy7e/z3/cz3bcryjjzoh3mc6awt2xnjrmwj3qds4eckfabakw2c4gjwbjwdt.py", line 168, in 2025-12-04T10:35:19.8619483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8619580Z kernel.precompile( 2025-12-04T10:35:19.8620058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8620165Z self._precompile_worker() 2025-12-04T10:35:19.8620682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8620835Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8621352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8621528Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8621914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8622127Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8622512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8622893Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8623094Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8623405Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8623488Z ^ 2025-12-04T10:35:19.8623884Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8623894Z 2025-12-04T10:35:19.8624508Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8624515Z 2025-12-04T10:35:19.8624519Z 2025-12-04T10:35:19.8624705Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8625436Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8625442Z 2025-12-04T10:35:19.8625695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8625883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8625974Z frames [('total', 1)] 2025-12-04T10:35:19.8626073Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8626277Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8626552Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8626637Z graph_break [] 2025-12-04T10:35:19.8626821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8626911Z frames [('total', 1)] 2025-12-04T10:35:19.8627008Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8627196Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8627405Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8627488Z graph_break [] 2025-12-04T10:35:19.8627617Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8627904Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:19.8628011Z Traceback (most recent call last): 2025-12-04T10:35:19.8628404Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T10:35:19.8628618Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:19.8629041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8629258Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8629702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8629879Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8630320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8630445Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8630912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8631195Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8631645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8631773Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8632188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8632295Z return self._compile_to_module() 2025-12-04T10:35:19.8632796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8632938Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8633387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8633498Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8633935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8634136Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8634642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8634755Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8635205Z File "/tmp/tmp8qn6ym91/rw/crwjyf4tpwijz7kinlm7r5t4ht7vt25aedwffhropawklw6k7ies.py", line 168, in 2025-12-04T10:35:19.8635662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8635756Z kernel.precompile( 2025-12-04T10:35:19.8636235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8636418Z self._precompile_worker() 2025-12-04T10:35:19.8636934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8637087Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8637603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8637773Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8638170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8638382Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8638761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8639054Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8639261Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8639574Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8639651Z ^ 2025-12-04T10:35:19.8640049Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8640054Z 2025-12-04T10:35:19.8640678Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8640682Z 2025-12-04T10:35:19.8640686Z 2025-12-04T10:35:19.8640872Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8641571Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8641580Z 2025-12-04T10:35:19.8641811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8641998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8642089Z frames [('total', 1)] 2025-12-04T10:35:19.8642187Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8642393Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8642583Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8642774Z graph_break [] 2025-12-04T10:35:19.8642963Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8643052Z frames [('total', 1)] 2025-12-04T10:35:19.8643150Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8643341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8643540Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8643633Z graph_break [] 2025-12-04T10:35:19.8643815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8643902Z frames [('total', 1)] 2025-12-04T10:35:19.8644003Z stats [('calls_captured', 7)] 2025-12-04T10:35:19.8644190Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8644387Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.8644474Z graph_break [] 2025-12-04T10:35:19.8645049Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.xml - 2025-12-04T10:35:19.8645198Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8645897Z FAILED [0.4719s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8646288Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8646366Z ^ 2025-12-04T10:35:19.8646762Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8646767Z 2025-12-04T10:35:19.8647383Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8647388Z 2025-12-04T10:35:19.8647396Z 2025-12-04T10:35:19.8647581Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8648277Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8648282Z 2025-12-04T10:35:19.8648513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8648678Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8648857Z ================== 1 failed, 187 deselected, 2 rerun in 3.07s ================== 2025-12-04T10:35:19.8648944Z Got exit code 1 2025-12-04T10:35:19.8649432Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:19.8649796Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:19.8650212Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.xml 2025-12-04T10:35:19.8650357Z ============================= test session starts ============================== 2025-12-04T10:35:19.8650660Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8650754Z cachedir: .pytest_cache 2025-12-04T10:35:19.8651215Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8651323Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8651415Z configfile: pytest.ini 2025-12-04T10:35:19.8651886Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8652080Z collecting ... collected 188 items / 5 deselected / 183 selected 2025-12-04T10:35:19.8652292Z stepcurrent: skipping 5 already run items. 2025-12-04T10:35:19.8652391Z Running 183 items in this shard 2025-12-04T10:35:19.8652396Z 2025-12-04T10:35:19.8652833Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [1.8549s] [ 0%] 2025-12-04T10:35:19.8653269Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2483s] [ 1%] 2025-12-04T10:35:19.8653710Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.5542s] [ 1%] 2025-12-04T10:35:19.8654144Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2751s] [ 2%] 2025-12-04T10:35:19.8654589Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6147s] [ 2%] 2025-12-04T10:35:19.8655052Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda ('RERUN', {'yellow': True}) [0.4122s] [ 3%] 2025-12-04T10:35:19.8655514Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda ('RERUN', {'yellow': True}) [0.5546s] [ 3%] 2025-12-04T10:35:19.8655957Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda FAILED [0.5286s] [ 3%] 2025-12-04T10:35:19.8655961Z 2025-12-04T10:35:19.8656092Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.8656421Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8656528Z Traceback (most recent call last): 2025-12-04T10:35:19.8656873Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8657006Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8657427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8657652Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8658096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8658265Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8658711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8658843Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8659358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8659640Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8660093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8660228Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8660641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8660749Z return self._compile_to_module() 2025-12-04T10:35:19.8661166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8661315Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8661762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8661874Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8662300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8662500Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8663094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8663209Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8663651Z File "/tmp/tmpk65eknjk/zz/czzej76ui2htys4cgkxwwfhgvy4m3d62u4l5huiwadjiy4qnyo35.py", line 108, in 2025-12-04T10:35:19.8664044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:19.8664147Z self._wait_futures(scope) 2025-12-04T10:35:19.8664574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:19.8664678Z kernel = result.result() 2025-12-04T10:35:19.8665057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:19.8665153Z return self.result_fn() 2025-12-04T10:35:19.8665572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:19.8665683Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:19.8666018Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:19.8666023Z 2025-12-04T10:35:19.8666166Z Name=triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8666271Z Traceback (most recent call last): 2025-12-04T10:35:19.8666818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:19.8666903Z result = job() 2025-12-04T10:35:19.8667415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:19.8667536Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:19.8668018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:19.8668121Z self._precompile_worker() 2025-12-04T10:35:19.8668635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8668788Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8669303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8669478Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8669865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8670074Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8670453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8670754Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8670915Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8671289Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8671364Z ^ 2025-12-04T10:35:19.8671760Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8671773Z 2025-12-04T10:35:19.8671776Z 2025-12-04T10:35:19.8672391Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8672396Z 2025-12-04T10:35:19.8672400Z 2025-12-04T10:35:19.8672586Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8673306Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8673312Z 2025-12-04T10:35:19.8673545Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8673732Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8673825Z frames [('total', 1)] 2025-12-04T10:35:19.8673924Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8674125Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8674435Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:19.8674520Z graph_break [] 2025-12-04T10:35:19.8674774Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8674878Z Traceback (most recent call last): 2025-12-04T10:35:19.8675217Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8675381Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8675825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8676044Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8676487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8676761Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8677203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8677328Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8677790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8678073Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8678522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8678651Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8679063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8679175Z return self._compile_to_module() 2025-12-04T10:35:19.8679674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8679816Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8680267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8680379Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8680811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8681014Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8681517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8681628Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8682077Z File "/tmp/tmpo287x3l8/ub/cubfbnb4srqrag7nakprt3xgm2a6lmhbgvdblomju257dl33rb7i.py", line 108, in 2025-12-04T10:35:19.8682466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:19.8682567Z self._wait_futures(scope) 2025-12-04T10:35:19.8682993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:19.8683092Z kernel = result.result() 2025-12-04T10:35:19.8683567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:19.8683667Z return self.result_fn() 2025-12-04T10:35:19.8684086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:19.8684197Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:19.8684528Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:19.8684538Z 2025-12-04T10:35:19.8684685Z Name=triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8684791Z Traceback (most recent call last): 2025-12-04T10:35:19.8685261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:19.8685367Z result = job() 2025-12-04T10:35:19.8685902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:19.8686034Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:19.8686511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:19.8686611Z self._precompile_worker() 2025-12-04T10:35:19.8687131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8687367Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8687886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8688062Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8688449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8688666Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8689046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8689342Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8689503Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8689873Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8689956Z ^ 2025-12-04T10:35:19.8690349Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8690354Z 2025-12-04T10:35:19.8690358Z 2025-12-04T10:35:19.8690975Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8690980Z 2025-12-04T10:35:19.8690983Z 2025-12-04T10:35:19.8691179Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8691817Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8691821Z 2025-12-04T10:35:19.8692055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8692248Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8692343Z frames [('total', 1)] 2025-12-04T10:35:19.8692444Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8692637Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8692950Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:19.8693034Z graph_break [] 2025-12-04T10:35:19.8693301Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8693395Z frames [('total', 1)] 2025-12-04T10:35:19.8693493Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8693690Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8694000Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:19.8694086Z graph_break [] 2025-12-04T10:35:19.8694220Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8694472Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8694579Z Traceback (most recent call last): 2025-12-04T10:35:19.8694924Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8695056Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8695482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8695697Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8696143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8696317Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8696756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8696960Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8697425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8697708Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8698163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8698297Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8698709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8698822Z return self._compile_to_module() 2025-12-04T10:35:19.8699289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8699447Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8699897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8700010Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8700440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8700639Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8701160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8701274Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8701717Z File "/tmp/tmpnpiyswwt/z7/cz7gnv5vlsc2vpat3huyfnaqm534wiinp2ejsbz6n6lifd3462lp.py", line 108, in 2025-12-04T10:35:19.8702109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:19.8702210Z self._wait_futures(scope) 2025-12-04T10:35:19.8702640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:19.8702744Z kernel = result.result() 2025-12-04T10:35:19.8703124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:19.8703228Z return self.result_fn() 2025-12-04T10:35:19.8703731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:19.8703848Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:19.8704183Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:19.8704188Z 2025-12-04T10:35:19.8704331Z Name=triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8704437Z Traceback (most recent call last): 2025-12-04T10:35:19.8704911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:19.8704997Z result = job() 2025-12-04T10:35:19.8705561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:19.8705682Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:19.8706165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:19.8706267Z self._precompile_worker() 2025-12-04T10:35:19.8706779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8706940Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8707451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8707705Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8708237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8708448Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8708829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8709130Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8709299Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8709672Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8709747Z ^ 2025-12-04T10:35:19.8710144Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8710156Z 2025-12-04T10:35:19.8710159Z 2025-12-04T10:35:19.8710775Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8710779Z 2025-12-04T10:35:19.8710783Z 2025-12-04T10:35:19.8710971Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8711616Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8711620Z 2025-12-04T10:35:19.8711850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8712043Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8712133Z frames [('total', 1)] 2025-12-04T10:35:19.8712237Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8712432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8712741Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:19.8712824Z graph_break [] 2025-12-04T10:35:19.8713015Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8713107Z frames [('total', 1)] 2025-12-04T10:35:19.8713205Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8713517Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8713830Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:19.8713919Z graph_break [] 2025-12-04T10:35:19.8714107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8714195Z frames [('total', 1)] 2025-12-04T10:35:19.8714296Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8714491Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8714800Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:19.8714887Z graph_break [] 2025-12-04T10:35:19.8715490Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.xml - 2025-12-04T10:35:19.8715664Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8716429Z FAILED [0.5286s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:19.8716435Z 2025-12-04T10:35:19.8716579Z Name=triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8716686Z Traceback (most recent call last): 2025-12-04T10:35:19.8717158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:19.8717378Z result = job() 2025-12-04T10:35:19.8717894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:19.8718015Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:19.8718503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:19.8718608Z self._precompile_worker() 2025-12-04T10:35:19.8719122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8719284Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8719796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8719980Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8720366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8720575Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8720961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8721256Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8721417Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8721786Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8721860Z ^ 2025-12-04T10:35:19.8722258Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8722267Z 2025-12-04T10:35:19.8722271Z 2025-12-04T10:35:19.8722886Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8722891Z 2025-12-04T10:35:19.8722894Z 2025-12-04T10:35:19.8723085Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8723809Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8723815Z 2025-12-04T10:35:19.8724050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8724206Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8724391Z ============== 1 failed, 5 passed, 5 deselected, 2 rerun in 5.09s ============== 2025-12-04T10:35:19.8724480Z Got exit code 1 2025-12-04T10:35:19.8724576Z Retrying single test... 2025-12-04T10:35:19.8724983Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.xml 2025-12-04T10:35:19.8725128Z ============================= test session starts ============================== 2025-12-04T10:35:19.8725430Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8725528Z cachedir: .pytest_cache 2025-12-04T10:35:19.8725989Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8726097Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8726193Z configfile: pytest.ini 2025-12-04T10:35:19.8726663Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8726857Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.8727514Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8727619Z Running 1 items in this shard 2025-12-04T10:35:19.8727624Z 2025-12-04T10:35:19.8728617Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8729370Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8729742Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8730123Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.8730570Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.8730977Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8731436Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8731912Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8732413Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8732913Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8733403Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8733779Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.8734226Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8734714Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8735107Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8735516Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8736096Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.8736549Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8737013Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.8737450Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8737952Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8738440Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.8738979Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.8739539Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.8739941Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.8740321Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.8740818Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.8741200Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.8741689Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.8742157Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.8742771Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.8743080Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8744752Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8745221Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8746170Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8746788Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8747555Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8748144Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8748904Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8749570Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8750094Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8750848Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8751320Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8752094Z E1204 10:18:31.828000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8752211Z ('RERUN', {'yellow': True}) [1.6612s] [100%] 2025-12-04T10:35:19.8753202Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8753944Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8754320Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8754702Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.8755143Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.8755564Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8756053Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8756519Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8757020Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8757526Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8758013Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8758390Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.8758939Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8759353Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8759746Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8760139Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8760694Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.8761146Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8761619Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.8762052Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8762556Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8763124Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.8763665Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.8764100Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.8764507Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.8764893Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.8765383Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.8765821Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.8766316Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.8766772Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.8767391Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.8767700Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8769359Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8769828Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8770802Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8771343Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8772108Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8772701Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8773463Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8774128Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8774654Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8775483Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8775844Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8776623Z E1204 10:18:32.108000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8776741Z ('RERUN', {'yellow': True}) [0.2473s] [100%] 2025-12-04T10:35:19.8777719Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8778472Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8778837Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8779266Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.8779711Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.8780104Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8780564Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8781028Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8781538Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8782039Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8782600Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8782981Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.8783425Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8783831Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8784226Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8784609Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8785159Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.8785641Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8786125Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.8786550Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8787130Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8787619Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.8788154Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.8788597Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.8788999Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.8789385Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.8789879Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.8790255Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.8790750Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.8791211Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.8791822Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.8792128Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8793791Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8794330Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8795226Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8795819Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8796579Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8797170Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8797925Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8798586Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8799187Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8799938Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8800254Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8801021Z E1204 10:18:32.356000 77059 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8801115Z FAILED [0.2465s] [100%] 2025-12-04T10:35:19.8801120Z 2025-12-04T10:35:19.8801245Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.8801504Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8801612Z Traceback (most recent call last): 2025-12-04T10:35:19.8801956Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8802093Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8802513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8802737Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8803185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8803357Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8803802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8803932Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8804394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8804675Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8805127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8805369Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8805834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8805938Z return self._compile_to_module() 2025-12-04T10:35:19.8806362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8806502Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8806952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8807069Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8807497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8807704Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8808364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8808474Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8808925Z File "/tmp/tmp8lonxdkc/uv/cuva5jjko7irujcy6q4rp6idbwoefjob6vzinwzhcbqljuilcl6d.py", line 58, in 2025-12-04T10:35:19.8809329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8809597Z kernel.precompile( 2025-12-04T10:35:19.8810077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8810178Z self._precompile_worker() 2025-12-04T10:35:19.8810696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8810852Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8811371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8811547Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8811933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8812150Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8812539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8812831Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8813033Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8813402Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8813482Z ^ 2025-12-04T10:35:19.8813883Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8813888Z 2025-12-04T10:35:19.8814505Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8814510Z 2025-12-04T10:35:19.8814518Z 2025-12-04T10:35:19.8814708Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8815355Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8815360Z 2025-12-04T10:35:19.8815593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8815781Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8815870Z frames [('total', 1)] 2025-12-04T10:35:19.8816083Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8816294Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8816493Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8816580Z graph_break [] 2025-12-04T10:35:19.8816832Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8816941Z Traceback (most recent call last): 2025-12-04T10:35:19.8817287Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8817420Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8817844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8818060Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8818510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8818676Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8819160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8819287Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8819747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8820114Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8820563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8820688Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8821106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8821215Z return self._compile_to_module() 2025-12-04T10:35:19.8821630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8821772Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8825874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8826019Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8826460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8826662Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8827175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8827285Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8827740Z File "/tmp/tmpp9380s2h/jb/cjbpyxcrwk6uaym3ltnvecck5fx7bzzsku5nxmv2fg3krjinvksr.py", line 58, in 2025-12-04T10:35:19.8828143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8828239Z kernel.precompile( 2025-12-04T10:35:19.8828726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8828834Z self._precompile_worker() 2025-12-04T10:35:19.8829352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8829510Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8830029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8830315Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8830706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8830920Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8831304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8831598Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8831801Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8832175Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8832250Z ^ 2025-12-04T10:35:19.8832653Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8832659Z 2025-12-04T10:35:19.8833281Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8833286Z 2025-12-04T10:35:19.8833290Z 2025-12-04T10:35:19.8833481Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8834118Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8834207Z 2025-12-04T10:35:19.8834440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8834634Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8834722Z frames [('total', 1)] 2025-12-04T10:35:19.8834822Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8835031Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8835229Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8835317Z graph_break [] 2025-12-04T10:35:19.8835506Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8835605Z frames [('total', 1)] 2025-12-04T10:35:19.8835723Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8835935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8836147Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8836235Z graph_break [] 2025-12-04T10:35:19.8836365Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8836619Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8836725Z Traceback (most recent call last): 2025-12-04T10:35:19.8837071Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8837216Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8837638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8837851Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8838298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8838469Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8838916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8839041Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8839502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8839788Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8840326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8840457Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8840878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8840983Z return self._compile_to_module() 2025-12-04T10:35:19.8841408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8841549Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8841996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8842110Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8842543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8842746Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8843252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8843362Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8843810Z File "/tmp/tmppxehlg09/fy/cfyfnvajlle66ucqnsevabypmshx5viix7a3tpd4lke4f4vrkqqa.py", line 58, in 2025-12-04T10:35:19.8844324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8850120Z kernel.precompile( 2025-12-04T10:35:19.8850618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8850722Z self._precompile_worker() 2025-12-04T10:35:19.8851253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8851410Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8851925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8852100Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8852496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8852706Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8853110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8853403Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8853603Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8853978Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8854056Z ^ 2025-12-04T10:35:19.8854452Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8854457Z 2025-12-04T10:35:19.8855076Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8855082Z 2025-12-04T10:35:19.8855086Z 2025-12-04T10:35:19.8855274Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8855971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8855978Z 2025-12-04T10:35:19.8856317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8856507Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8856599Z frames [('total', 1)] 2025-12-04T10:35:19.8856699Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8856907Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8857100Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8857187Z graph_break [] 2025-12-04T10:35:19.8857371Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8857469Z frames [('total', 1)] 2025-12-04T10:35:19.8857566Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8857759Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8857962Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8858044Z graph_break [] 2025-12-04T10:35:19.8858235Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8858322Z frames [('total', 1)] 2025-12-04T10:35:19.8858422Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8858612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8858812Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8858898Z graph_break [] 2025-12-04T10:35:19.8859532Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.xml - 2025-12-04T10:35:19.8859741Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8860464Z FAILED [0.2465s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8860835Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8860915Z ^ 2025-12-04T10:35:19.8861316Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8861321Z 2025-12-04T10:35:19.8861936Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8861944Z 2025-12-04T10:35:19.8861948Z 2025-12-04T10:35:19.8862144Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8862781Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8862789Z 2025-12-04T10:35:19.8863022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8863183Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8863362Z ================== 1 failed, 187 deselected, 2 rerun in 2.19s ================== 2025-12-04T10:35:19.8863453Z Got exit code 1 2025-12-04T10:35:19.8863545Z Retrying single test... 2025-12-04T10:35:19.8863950Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.xml 2025-12-04T10:35:19.8864096Z ============================= test session starts ============================== 2025-12-04T10:35:19.8864398Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8864495Z cachedir: .pytest_cache 2025-12-04T10:35:19.8864950Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8865061Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8865158Z configfile: pytest.ini 2025-12-04T10:35:19.8865765Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8865965Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.8866538Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8866637Z Running 1 items in this shard 2025-12-04T10:35:19.8866641Z 2025-12-04T10:35:19.8867633Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8868387Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8868765Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8869142Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.8869587Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.8869983Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8870544Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8871231Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8871883Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8872389Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8872874Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8873250Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.8873701Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8874111Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8874504Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8874890Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8875483Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.8875942Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8876407Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.8876840Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8877341Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8877927Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.8878469Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.8878903Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.8879307Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.8879690Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.8880182Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.8880567Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.8881058Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.8881519Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.8882129Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.8882483Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8884199Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8884662Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8885671Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8886220Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8886989Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8887573Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8888328Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8888991Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8889523Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8890384Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8890696Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8891466Z E1204 10:18:42.461000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8891585Z ('RERUN', {'yellow': True}) [1.6693s] [100%] 2025-12-04T10:35:19.8892573Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8893323Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8893689Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8894067Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.8894552Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.8894947Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8895488Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8895978Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8896482Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8896981Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8897473Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8897848Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.8898301Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8898717Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8899162Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8899546Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8900096Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.8900552Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8901018Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.8901443Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8902030Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8902522Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.8903059Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.8903496Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.8903900Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.8904283Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.8904775Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.8905159Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.8905650Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.8906158Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.8906770Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.8907128Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8908951Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8909416Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8910316Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8910858Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8911622Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8912207Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8912964Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8913639Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8914307Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8915057Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8915386Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8916189Z E1204 10:18:42.743000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8916308Z ('RERUN', {'yellow': True}) [0.2494s] [100%] 2025-12-04T10:35:19.8917291Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8918040Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8918404Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8918844Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:19.8919351Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:19.8919747Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.8920210Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.8920674Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.8921176Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.8921677Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.8922156Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.8922537Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.8922984Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.8923392Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.8923782Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.8924167Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.8924717Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.8925166Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.8925767Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.8926194Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.8926695Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.8927184Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.8927719Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.8928166Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.8928568Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.8928951Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.8929439Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.8929818Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.8930352Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.8930856Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.8931472Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.8931780Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.8933435Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.8933903Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.8934804Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8935343Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8936103Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8936693Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8937457Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8938300Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8938824Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.8939619Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8939930Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.8940698Z E1204 10:18:42.993000 77240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8940799Z FAILED [0.2485s] [100%] 2025-12-04T10:35:19.8940804Z 2025-12-04T10:35:19.8940931Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.8941185Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8941291Z Traceback (most recent call last): 2025-12-04T10:35:19.8941631Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8941818Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8942238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8942506Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8942950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8943123Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8943569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8943700Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8944160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8944448Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8944900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8945034Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8945465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8945577Z return self._compile_to_module() 2025-12-04T10:35:19.8946022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8946161Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8946616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8946729Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8947160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8947363Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8947871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8947984Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8948536Z File "/tmp/tmp9tcbijrv/xo/cxo33zxdzb3qc376pcuo6i3b6rmssudfx3eitm4empddy7gvcvqq.py", line 58, in 2025-12-04T10:35:19.8948938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8949037Z kernel.precompile( 2025-12-04T10:35:19.8949516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8949619Z self._precompile_worker() 2025-12-04T10:35:19.8950138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8950292Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8950810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8950982Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8951372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8951589Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8951972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8952265Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8952510Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8952879Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8953006Z ^ 2025-12-04T10:35:19.8953402Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8953407Z 2025-12-04T10:35:19.8954030Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8954039Z 2025-12-04T10:35:19.8954043Z 2025-12-04T10:35:19.8954232Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8954872Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8954883Z 2025-12-04T10:35:19.8955118Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8955308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8955421Z frames [('total', 1)] 2025-12-04T10:35:19.8955530Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8955767Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8955961Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8956047Z graph_break [] 2025-12-04T10:35:19.8956306Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8956419Z Traceback (most recent call last): 2025-12-04T10:35:19.8956759Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8956896Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8957314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8957531Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8957978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8958148Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8958586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8958795Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8959259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8959542Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8959994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8960128Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8960543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8960649Z return self._compile_to_module() 2025-12-04T10:35:19.8961068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8961214Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8961658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8961772Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8962200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8962399Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8962953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8963108Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8963553Z File "/tmp/tmpjshhi764/zm/czmuberi25g4ahoncv7tyrwejxkrnzsnlto4j6mfvbr4wxi2cjlp.py", line 58, in 2025-12-04T10:35:19.8963953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8964054Z kernel.precompile( 2025-12-04T10:35:19.8964536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8964639Z self._precompile_worker() 2025-12-04T10:35:19.8965160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8965317Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8965831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8966012Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8966398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8966608Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8966995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8967286Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8967487Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8967855Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8967932Z ^ 2025-12-04T10:35:19.8968331Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8968339Z 2025-12-04T10:35:19.8968953Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8968959Z 2025-12-04T10:35:19.8968963Z 2025-12-04T10:35:19.8969231Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8969870Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8969875Z 2025-12-04T10:35:19.8970108Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8970296Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8970390Z frames [('total', 1)] 2025-12-04T10:35:19.8970500Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8970713Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8970907Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8970994Z graph_break [] 2025-12-04T10:35:19.8971190Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8971277Z frames [('total', 1)] 2025-12-04T10:35:19.8971378Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8971574Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8971777Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8971865Z graph_break [] 2025-12-04T10:35:19.8971992Z =================================== FAILURES =================================== 2025-12-04T10:35:19.8972243Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T10:35:19.8972401Z Traceback (most recent call last): 2025-12-04T10:35:19.8972747Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.8972956Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.8973377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.8973594Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.8974045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.8974222Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.8974668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.8974801Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.8975267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.8975566Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.8976051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.8976180Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.8976607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.8976711Z return self._compile_to_module() 2025-12-04T10:35:19.8977127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.8977274Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.8977720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.8977833Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.8978274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.8978481Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.8979142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.8979257Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.8979700Z File "/tmp/tmp9sbz1deu/sc/cscuwzk2qxdyvwkgqlg6pvzlidnuaf6v26jmkbp6ofr6gsbbgyhc.py", line 58, in 2025-12-04T10:35:19.8980106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.8980201Z kernel.precompile( 2025-12-04T10:35:19.8980693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.8980798Z self._precompile_worker() 2025-12-04T10:35:19.8981312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.8981476Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.8981997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.8982169Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.8982562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.8982774Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.8983241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.8983580Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.8983782Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8984211Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8984286Z ^ 2025-12-04T10:35:19.8984688Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8984693Z 2025-12-04T10:35:19.8985311Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8985316Z 2025-12-04T10:35:19.8985319Z 2025-12-04T10:35:19.8985533Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8986208Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8986212Z 2025-12-04T10:35:19.8986446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8986642Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8986732Z frames [('total', 1)] 2025-12-04T10:35:19.8986837Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8987055Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8987246Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8987333Z graph_break [] 2025-12-04T10:35:19.8987518Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8987606Z frames [('total', 1)] 2025-12-04T10:35:19.8987709Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8987901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8988105Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8988192Z graph_break [] 2025-12-04T10:35:19.8988378Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.8988465Z frames [('total', 1)] 2025-12-04T10:35:19.8988564Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.8988751Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.8989035Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.8989122Z graph_break [] 2025-12-04T10:35:19.8989684Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.xml - 2025-12-04T10:35:19.8989834Z =========================== short test summary info ============================ 2025-12-04T10:35:19.8990454Z FAILED [0.2485s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.8990835Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8990917Z ^ 2025-12-04T10:35:19.8991317Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.8991321Z 2025-12-04T10:35:19.8991944Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.8991948Z 2025-12-04T10:35:19.8991952Z 2025-12-04T10:35:19.8992139Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.8992778Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8992830Z 2025-12-04T10:35:19.8993061Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.8993219Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.8993444Z ================== 1 failed, 187 deselected, 2 rerun in 2.20s ================== 2025-12-04T10:35:19.8993530Z Got exit code 1 2025-12-04T10:35:19.8993969Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T10:35:19.8994332Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:19.8994742Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.xml 2025-12-04T10:35:19.8994899Z ============================= test session starts ============================== 2025-12-04T10:35:19.8995202Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.8995296Z cachedir: .pytest_cache 2025-12-04T10:35:19.8995762Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.8995872Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.8995967Z configfile: pytest.ini 2025-12-04T10:35:19.8996436Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.8996638Z collecting ... collected 188 items / 11 deselected / 177 selected 2025-12-04T10:35:19.8996764Z stepcurrent: skipping 11 already run items. 2025-12-04T10:35:19.8996863Z Running 177 items in this shard 2025-12-04T10:35:19.8996867Z 2025-12-04T10:35:19.8997865Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.8998621Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.8998993Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.8999456Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.8999903Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9000304Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9000771Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9001240Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9001751Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9002258Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9002745Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9003121Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9003571Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9004024Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9004462Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9004853Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9005411Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9005867Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9006334Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9006767Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9007268Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9007900Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9008477Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9008935Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9009355Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9009762Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9010275Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9010672Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9011308Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9011789Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9012439Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9012763Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9014559Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9015046Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9016050Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9016671Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9017503Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9018093Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9018850Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9019561Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9020090Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9020848Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9021159Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9021930Z E1204 10:18:53.066000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9022049Z ('RERUN', {'yellow': True}) [1.6732s] [ 0%] 2025-12-04T10:35:19.9023028Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9023866Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9024233Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9024615Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.9025060Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9025483Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9025979Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9026445Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9026956Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9027455Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9027939Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9028365Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9028810Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9029270Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9029665Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9030047Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9030600Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9031051Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9031521Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9031954Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9032464Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9032953Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9033488Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9033928Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9034327Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9034711Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9035199Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9035654Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9036150Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9036609Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9037225Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9037539Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9039201Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9039662Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9040609Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9041205Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9041977Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9042565Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9043325Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9043991Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9044524Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9045272Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9045583Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9046403Z E1204 10:18:53.348000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9046525Z ('RERUN', {'yellow': True}) [0.2497s] [ 0%] 2025-12-04T10:35:19.9047509Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9048330Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9048696Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9049083Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.9049534Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9049929Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9050396Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9050869Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9051372Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9051877Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9052395Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9052820Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9053269Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9053691Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9054084Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9054463Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9055022Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9055483Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9056000Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9056431Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9056930Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9057424Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9057961Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9058409Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9058816Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9059367Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9059867Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9060244Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9060740Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9061201Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9061814Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9062130Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9063788Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9064293Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9065233Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9065833Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9066596Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9067187Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9067942Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9068615Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9069140Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9069885Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9070202Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9070973Z E1204 10:18:53.598000 77421 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9071068Z FAILED [0.2485s] [ 0%] 2025-12-04T10:35:19.9071073Z 2025-12-04T10:35:19.9071277Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.9071538Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9071652Z Traceback (most recent call last): 2025-12-04T10:35:19.9071994Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9072135Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9072559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9072777Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9073237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9073404Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9073854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9073980Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9074443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9074727Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9075223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9075351Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9075777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9075923Z return self._compile_to_module() 2025-12-04T10:35:19.9076344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9076492Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9076944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9077061Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9077487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9077695Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9078201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9078318Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9078763Z File "/tmp/tmp8cc63eui/2g/c2guvru7lxggripjwctrfjt5hfi24ko4xenpsgegaxa6a7shmek5.py", line 58, in 2025-12-04T10:35:19.9079170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9079269Z kernel.precompile( 2025-12-04T10:35:19.9079755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9079856Z self._precompile_worker() 2025-12-04T10:35:19.9080372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9080535Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9081051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9081231Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9081619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9081912Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9082301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9082595Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9082806Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9083176Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9083254Z ^ 2025-12-04T10:35:19.9083662Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9083669Z 2025-12-04T10:35:19.9084289Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9084294Z 2025-12-04T10:35:19.9084297Z 2025-12-04T10:35:19.9084491Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9085134Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9085139Z 2025-12-04T10:35:19.9085395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9085665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9085753Z frames [('total', 1)] 2025-12-04T10:35:19.9085868Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9086081Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9086317Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9086413Z graph_break [] 2025-12-04T10:35:19.9086666Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9086784Z Traceback (most recent call last): 2025-12-04T10:35:19.9087134Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9087267Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9087696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9087911Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9088363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9088538Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9088984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9089119Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9089588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9089867Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9090327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9090458Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9090882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9090986Z return self._compile_to_module() 2025-12-04T10:35:19.9091404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9091554Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9092085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9092199Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9092632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9092832Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9093340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9093457Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9093889Z File "/tmp/tmpz5q_71gu/pf/cpfzckpbvomki4hkwk2wjfi737oqxeouq7ywzvhqfinnsxnnb73i.py", line 58, in 2025-12-04T10:35:19.9094294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9094389Z kernel.precompile( 2025-12-04T10:35:19.9094874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9094974Z self._precompile_worker() 2025-12-04T10:35:19.9095515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9095697Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9096209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9096430Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9096816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9097072Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9097462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9097757Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9097957Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9098336Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9098411Z ^ 2025-12-04T10:35:19.9098815Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9098825Z 2025-12-04T10:35:19.9099485Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9099493Z 2025-12-04T10:35:19.9099497Z 2025-12-04T10:35:19.9099689Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9100342Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9100347Z 2025-12-04T10:35:19.9100579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9100770Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9100862Z frames [('total', 1)] 2025-12-04T10:35:19.9100964Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9101179Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9101374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9101471Z graph_break [] 2025-12-04T10:35:19.9101655Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9101744Z frames [('total', 1)] 2025-12-04T10:35:19.9101845Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9102034Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9102350Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9102438Z graph_break [] 2025-12-04T10:35:19.9102565Z =================================== FAILURES =================================== 2025-12-04T10:35:19.9102818Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9102925Z Traceback (most recent call last): 2025-12-04T10:35:19.9103267Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9103401Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9103820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9104037Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9104497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9104661Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9105106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9105237Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9105723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9106081Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9106527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9106779Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9107197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9107308Z return self._compile_to_module() 2025-12-04T10:35:19.9107927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9108072Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9108583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9108735Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9113369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9113602Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9114125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9114246Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9114703Z File "/tmp/tmpymccacl8/rk/crke2xdhbj3meedbntxg6czc4qz6r2p3qojjwhvghfnqs4frkgpw.py", line 58, in 2025-12-04T10:35:19.9115105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9115201Z kernel.precompile( 2025-12-04T10:35:19.9115688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9115790Z self._precompile_worker() 2025-12-04T10:35:19.9116308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9116464Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9116977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9117157Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9117704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9117927Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9118309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9118600Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9118805Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9119181Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9119262Z ^ 2025-12-04T10:35:19.9119661Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9119667Z 2025-12-04T10:35:19.9120285Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9120290Z 2025-12-04T10:35:19.9120293Z 2025-12-04T10:35:19.9120484Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9121124Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9121186Z 2025-12-04T10:35:19.9121424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9121613Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9121760Z frames [('total', 1)] 2025-12-04T10:35:19.9121865Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9122073Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9122272Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9122363Z graph_break [] 2025-12-04T10:35:19.9122552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9122643Z frames [('total', 1)] 2025-12-04T10:35:19.9122741Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9122930Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9123134Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9123221Z graph_break [] 2025-12-04T10:35:19.9123402Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9123492Z frames [('total', 1)] 2025-12-04T10:35:19.9123598Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9123785Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9123989Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9124073Z graph_break [] 2025-12-04T10:35:19.9124649Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.xml - 2025-12-04T10:35:19.9124797Z =========================== short test summary info ============================ 2025-12-04T10:35:19.9125430Z FAILED [0.2485s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9125857Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9125933Z ^ 2025-12-04T10:35:19.9126333Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9126340Z 2025-12-04T10:35:19.9126953Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9127045Z 2025-12-04T10:35:19.9127050Z 2025-12-04T10:35:19.9127237Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9127882Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9127887Z 2025-12-04T10:35:19.9128117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9128285Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.9128456Z ================== 1 failed, 11 deselected, 2 rerun in 2.21s =================== 2025-12-04T10:35:19.9128548Z Got exit code 1 2025-12-04T10:35:19.9128646Z Retrying single test... 2025-12-04T10:35:19.9129052Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.xml 2025-12-04T10:35:19.9129205Z ============================= test session starts ============================== 2025-12-04T10:35:19.9129509Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.9129603Z cachedir: .pytest_cache 2025-12-04T10:35:19.9130059Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.9130169Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.9130307Z configfile: pytest.ini 2025-12-04T10:35:19.9130778Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.9131016Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.9131598Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9131696Z Running 1 items in this shard 2025-12-04T10:35:19.9131700Z 2025-12-04T10:35:19.9132690Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9133441Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9133811Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9134198Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.9134643Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9135052Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9135511Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9136004Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9136528Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9137028Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9137516Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9137972Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9138419Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9138827Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9139289Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9139673Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9140225Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9140676Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9141145Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9141570Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9142070Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9142615Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9143218Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9143661Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9144058Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9144441Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9144928Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9145308Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9145800Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9146256Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9146871Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9147180Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9148845Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9149394Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9150291Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9150831Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9151596Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9152189Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9152946Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9153608Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9154132Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9154924Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9155272Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9156095Z E1204 10:19:03.696000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9156215Z ('RERUN', {'yellow': True}) [1.6996s] [100%] 2025-12-04T10:35:19.9157194Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9157944Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9158315Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9158712Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.9159157Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9159549Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9160009Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9160475Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9160979Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9161558Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9162038Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9162417Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9162861Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9163271Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9163663Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9164043Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9164603Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9165050Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9165516Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9166033Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9166533Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9167063Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9167601Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9168040Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9168437Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9168821Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9169308Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9169685Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9170180Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9170638Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9171246Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9171556Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9173298Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9173763Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9174654Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9175202Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9176013Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9176603Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9177356Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9178163Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9178907Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9179805Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9180119Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9180884Z E1204 10:19:03.981000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9181002Z ('RERUN', {'yellow': True}) [0.2515s] [100%] 2025-12-04T10:35:19.9181987Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9182732Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9183103Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9183481Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.9183929Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9184326Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9184788Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9185253Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9185846Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9186349Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9186825Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9187206Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9187652Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9188063Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9188456Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9188841Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9189394Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9189840Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9190377Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9190843Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9191342Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9191838Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9192471Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9192912Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9193315Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9193697Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9194188Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9194566Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9195059Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9195541Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9196172Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9196483Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9198219Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9198687Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9199582Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9200126Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9200891Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9201478Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9202231Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9202936Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9203504Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9204256Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9204572Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9205340Z E1204 10:19:04.232000 77602 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9205436Z FAILED [0.2495s] [100%] 2025-12-04T10:35:19.9205443Z 2025-12-04T10:35:19.9205569Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.9205822Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9205933Z Traceback (most recent call last): 2025-12-04T10:35:19.9206278Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9206419Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9206838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9207056Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9207505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9207671Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9208321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9208453Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9209045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9209328Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9209778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9209906Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9210322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9210429Z return self._compile_to_module() 2025-12-04T10:35:19.9210849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9210992Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9211437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9211557Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9211984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9212188Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9212695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9212861Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9213284Z File "/tmp/tmpik_5wqao/ob/cob6n3um5rwdndqbfljtoc4j5vyujm37rrko3dab5nwzpjyhkffb.py", line 58, in 2025-12-04T10:35:19.9213683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9213835Z kernel.precompile( 2025-12-04T10:35:19.9214318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9214428Z self._precompile_worker() 2025-12-04T10:35:19.9214945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9215098Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9215661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9215838Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9216225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9216439Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9216824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9217118Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9217318Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9217687Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9217762Z ^ 2025-12-04T10:35:19.9218161Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9218169Z 2025-12-04T10:35:19.9218783Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9218790Z 2025-12-04T10:35:19.9218794Z 2025-12-04T10:35:19.9218983Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9219776Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9219782Z 2025-12-04T10:35:19.9220018Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9220206Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9220295Z frames [('total', 1)] 2025-12-04T10:35:19.9220400Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9220606Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9220805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9220892Z graph_break [] 2025-12-04T10:35:19.9221143Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9221251Z Traceback (most recent call last): 2025-12-04T10:35:19.9221593Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9221726Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9222156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9222370Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9222814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9222980Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9223464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9223591Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9224102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9224381Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9224837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9224963Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9225376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9225505Z return self._compile_to_module() 2025-12-04T10:35:19.9225954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9226098Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9226541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9226655Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9227083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9227287Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9227795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9227906Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9228332Z File "/tmp/tmprc5k761_/lp/clps3gqidvhsma7uvwji6busb23skm6bjuhl65oruehcaryla2bh.py", line 58, in 2025-12-04T10:35:19.9228738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9228835Z kernel.precompile( 2025-12-04T10:35:19.9229314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9229417Z self._precompile_worker() 2025-12-04T10:35:19.9230036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9230196Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9230708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9230878Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9231269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9231483Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9231865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9232156Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9232354Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9232731Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9232806Z ^ 2025-12-04T10:35:19.9233201Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9233209Z 2025-12-04T10:35:19.9233822Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9233869Z 2025-12-04T10:35:19.9233873Z 2025-12-04T10:35:19.9234060Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9234747Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9234752Z 2025-12-04T10:35:19.9234981Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9235174Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9235265Z frames [('total', 1)] 2025-12-04T10:35:19.9235364Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9235573Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9235764Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9235851Z graph_break [] 2025-12-04T10:35:19.9236038Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9236125Z frames [('total', 1)] 2025-12-04T10:35:19.9236224Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9236415Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9236617Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9236704Z graph_break [] 2025-12-04T10:35:19.9236829Z =================================== FAILURES =================================== 2025-12-04T10:35:19.9237084Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9237193Z Traceback (most recent call last): 2025-12-04T10:35:19.9237535Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9237670Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9238089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9238305Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9238750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9238916Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9239355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9239567Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9240028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9240308Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9240754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9240883Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9241303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9241410Z return self._compile_to_module() 2025-12-04T10:35:19.9241829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9241974Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9242418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9242533Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9242958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9243157Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9243711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9243860Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9244299Z File "/tmp/tmpx340wjaq/mo/cmogrw5skvbbj2xu4hg6eqish63gmfdg6mr6bnmo42nj6xemg2j5.py", line 58, in 2025-12-04T10:35:19.9244700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9244798Z kernel.precompile( 2025-12-04T10:35:19.9245280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9245379Z self._precompile_worker() 2025-12-04T10:35:19.9245946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9246101Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9246614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9246789Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9247174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9247383Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9247771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9248060Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9248259Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9248627Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9248708Z ^ 2025-12-04T10:35:19.9249106Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9249114Z 2025-12-04T10:35:19.9249729Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9249734Z 2025-12-04T10:35:19.9249738Z 2025-12-04T10:35:19.9250012Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9250656Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9250661Z 2025-12-04T10:35:19.9250893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9251078Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9251171Z frames [('total', 1)] 2025-12-04T10:35:19.9251271Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9251478Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9251671Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9251757Z graph_break [] 2025-12-04T10:35:19.9251946Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9252036Z frames [('total', 1)] 2025-12-04T10:35:19.9252146Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9252336Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9252547Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9252633Z graph_break [] 2025-12-04T10:35:19.9252816Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9252907Z frames [('total', 1)] 2025-12-04T10:35:19.9253050Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9253240Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9253445Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9253570Z graph_break [] 2025-12-04T10:35:19.9254138Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.xml - 2025-12-04T10:35:19.9254288Z =========================== short test summary info ============================ 2025-12-04T10:35:19.9254922Z FAILED [0.2495s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9255299Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9255377Z ^ 2025-12-04T10:35:19.9255773Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9255785Z 2025-12-04T10:35:19.9256397Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9256405Z 2025-12-04T10:35:19.9256408Z 2025-12-04T10:35:19.9256594Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9257242Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9257246Z 2025-12-04T10:35:19.9257479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9257638Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.9257813Z ================== 1 failed, 187 deselected, 2 rerun in 2.23s ================== 2025-12-04T10:35:19.9257900Z Got exit code 1 2025-12-04T10:35:19.9257997Z Retrying single test... 2025-12-04T10:35:19.9258409Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.xml 2025-12-04T10:35:19.9258555Z ============================= test session starts ============================== 2025-12-04T10:35:19.9258856Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.9258950Z cachedir: .pytest_cache 2025-12-04T10:35:19.9259575Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.9259685Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.9259780Z configfile: pytest.ini 2025-12-04T10:35:19.9260254Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.9260448Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.9261027Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9261134Z Running 1 items in this shard 2025-12-04T10:35:19.9261138Z 2025-12-04T10:35:19.9262129Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9262882Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9263254Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9263688Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.9264135Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9264575Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9265044Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9265510Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9266015Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9266517Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9267003Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9267384Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9267836Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9268244Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9268637Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9269025Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9269583Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9270034Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9270502Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9271036Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9271541Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9272109Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9272653Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9273095Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9273492Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9273881Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9274371Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9274744Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9275288Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9275795Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9276446Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9276761Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9278419Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9278887Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9279791Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9280333Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9281095Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9281684Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9282440Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9283182Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9283708Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9284459Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9284773Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9285565Z E1204 10:19:14.262000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9285705Z ('RERUN', {'yellow': True}) [1.6763s] [100%] 2025-12-04T10:35:19.9286689Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9287433Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9287919Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9288341Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.9288785Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9289183Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9289646Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9290108Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9290618Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9291116Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9291594Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9291979Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9292422Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9292829Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9293223Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9293605Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9294162Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9294688Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9295159Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9295595Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9296129Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9296622Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9297159Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9297601Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9297998Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9298377Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9298869Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9299338Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9299834Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9300337Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9300956Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9301263Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9302917Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9303394Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9304287Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9304831Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9305620Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9306232Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9307067Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9307869Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9308397Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9309142Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9309462Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9310233Z E1204 10:19:14.545000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9310350Z ('RERUN', {'yellow': True}) [0.2497s] [100%] 2025-12-04T10:35:19.9311327Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9312139Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9312585Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9312969Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:19.9313418Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:19.9313811Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9314274Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9314741Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9315238Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9315742Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9316222Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9316601Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9317044Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9317450Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9317844Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9318226Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9318887Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:19.9319339Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9319802Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T10:35:19.9320238Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9320739Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9321237Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T10:35:19.9321773Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9322213Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:19.9322609Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T10:35:19.9323039Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T10:35:19.9323530Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T10:35:19.9323945Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T10:35:19.9324445Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T10:35:19.9324903Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T10:35:19.9325534Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T10:35:19.9325874Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9327530Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9328003Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9328893Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9329444Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9330207Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9330876Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9331632Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9332291Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9332824Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9333569Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9333887Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9334652Z E1204 10:19:14.795000 77783 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9334745Z FAILED [0.2486s] [100%] 2025-12-04T10:35:19.9334792Z 2025-12-04T10:35:19.9334918Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.9335171Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9335323Z Traceback (most recent call last): 2025-12-04T10:35:19.9335689Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9335843Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9336278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9336495Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9336941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9337109Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9337553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9337682Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9338146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9338442Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9338896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9339072Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9339497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9339602Z return self._compile_to_module() 2025-12-04T10:35:19.9340017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9340166Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9340611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9340731Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9341156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9341440Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9341951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9342061Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9342506Z File "/tmp/tmptaq29jvg/2a/c2aqw2alrfduad2mdb2ncjpize4q2h4xiirhhtewzphqzzoxshhs.py", line 58, in 2025-12-04T10:35:19.9342910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9343009Z kernel.precompile( 2025-12-04T10:35:19.9343494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9343598Z self._precompile_worker() 2025-12-04T10:35:19.9344115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9344283Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9344796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9344976Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9345373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9345668Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9346051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9346383Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9346581Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9346954Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9347030Z ^ 2025-12-04T10:35:19.9347431Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9347436Z 2025-12-04T10:35:19.9348054Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9348062Z 2025-12-04T10:35:19.9348066Z 2025-12-04T10:35:19.9348256Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9348900Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9348906Z 2025-12-04T10:35:19.9349142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9349339Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9349433Z frames [('total', 1)] 2025-12-04T10:35:19.9349538Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9349744Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9349937Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9350029Z graph_break [] 2025-12-04T10:35:19.9350280Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9350390Z Traceback (most recent call last): 2025-12-04T10:35:19.9350736Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9350870Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9351292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9351507Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9352039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9352213Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9352652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9352778Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9353245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9353523Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9353976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9354101Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9354517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9354623Z return self._compile_to_module() 2025-12-04T10:35:19.9355039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9355187Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9355682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9355865Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9356294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9356535Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9357040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9357160Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9357608Z File "/tmp/tmp5dp7zlqg/wp/cwpnespmmlwlvhbagstjsstmqnc5p6ceiqwoai7lw3zk44qw3ava.py", line 58, in 2025-12-04T10:35:19.9358015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9358112Z kernel.precompile( 2025-12-04T10:35:19.9358593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9358698Z self._precompile_worker() 2025-12-04T10:35:19.9359210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9359369Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9359884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9360056Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9360446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9360663Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9361046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9361341Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9361538Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9361910Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9361985Z ^ 2025-12-04T10:35:19.9362459Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9362465Z 2025-12-04T10:35:19.9363086Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9363091Z 2025-12-04T10:35:19.9363095Z 2025-12-04T10:35:19.9363283Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9363933Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9363938Z 2025-12-04T10:35:19.9364168Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9364362Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9364451Z frames [('total', 1)] 2025-12-04T10:35:19.9364552Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9364770Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9364964Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9365048Z graph_break [] 2025-12-04T10:35:19.9365237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9365329Z frames [('total', 1)] 2025-12-04T10:35:19.9365427Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9365665Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9365868Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9365955Z graph_break [] 2025-12-04T10:35:19.9366080Z =================================== FAILURES =================================== 2025-12-04T10:35:19.9366375Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T10:35:19.9366484Z Traceback (most recent call last): 2025-12-04T10:35:19.9366831Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9366967Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9367397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9367610Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9368059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9368227Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9368671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9368806Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9369272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9369558Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9370011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9370137Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9370552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9370664Z return self._compile_to_module() 2025-12-04T10:35:19.9371080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9371230Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9371672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9371794Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9372306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9372509Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9373018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9373133Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9373583Z File "/tmp/tmprr3o4bwn/fa/cfacopmfzqejrwnkyt657ywnsombld27bdowi7qrlsxiz3ur4tov.py", line 58, in 2025-12-04T10:35:19.9373983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9374081Z kernel.precompile( 2025-12-04T10:35:19.9374563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9374662Z self._precompile_worker() 2025-12-04T10:35:19.9375185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9375345Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9375857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9376074Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9376460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9376674Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9377101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9377392Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9377600Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9377975Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9378054Z ^ 2025-12-04T10:35:19.9378453Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9378461Z 2025-12-04T10:35:19.9379143Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9379148Z 2025-12-04T10:35:19.9379153Z 2025-12-04T10:35:19.9379347Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9379990Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9379999Z 2025-12-04T10:35:19.9380231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9380420Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9380509Z frames [('total', 1)] 2025-12-04T10:35:19.9380610Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9380818Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9381016Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9381104Z graph_break [] 2025-12-04T10:35:19.9381289Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9381379Z frames [('total', 1)] 2025-12-04T10:35:19.9381481Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9381669Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9381876Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9382045Z graph_break [] 2025-12-04T10:35:19.9382231Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9382324Z frames [('total', 1)] 2025-12-04T10:35:19.9382421Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9382609Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9382812Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9382900Z graph_break [] 2025-12-04T10:35:19.9383466Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.xml - 2025-12-04T10:35:19.9383621Z =========================== short test summary info ============================ 2025-12-04T10:35:19.9384248Z FAILED [0.2486s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9384628Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9384703Z ^ 2025-12-04T10:35:19.9385100Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9385105Z 2025-12-04T10:35:19.9385772Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9385822Z 2025-12-04T10:35:19.9385826Z 2025-12-04T10:35:19.9386014Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9386695Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9386699Z 2025-12-04T10:35:19.9386932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9387099Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.9387273Z ================== 1 failed, 187 deselected, 2 rerun in 2.21s ================== 2025-12-04T10:35:19.9387359Z Got exit code 1 2025-12-04T10:35:19.9387795Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T10:35:19.9388156Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:19.9388565Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.xml 2025-12-04T10:35:19.9388715Z ============================= test session starts ============================== 2025-12-04T10:35:19.9389016Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.9389113Z cachedir: .pytest_cache 2025-12-04T10:35:19.9389570Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.9393501Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.9393622Z configfile: pytest.ini 2025-12-04T10:35:19.9394098Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.9394303Z collecting ... collected 188 items / 12 deselected / 176 selected 2025-12-04T10:35:19.9394434Z stepcurrent: skipping 12 already run items. 2025-12-04T10:35:19.9394534Z Running 176 items in this shard 2025-12-04T10:35:19.9394539Z 2025-12-04T10:35:19.9395590Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9396483Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9396856Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9397229Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9397678Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9398084Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9398547Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9399021Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9399520Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9400023Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9400554Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9400934Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9401433Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9401845Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9402240Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9402625Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9403132Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9403588Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9404057Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9404553Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9405048Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9405634Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9406072Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9406473Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9406857Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9407343Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9408030Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9408523Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9408969Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9409577Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9409877Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9411532Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9411990Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9412934Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9413528Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9414285Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9414872Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9415616Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9416276Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9416791Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9417530Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9417842Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9418597Z E1204 10:19:25.115000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9418708Z ('RERUN', {'yellow': True}) [1.9521s] [ 0%] 2025-12-04T10:35:19.9419747Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9420586Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9420947Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9421313Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9421746Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9422129Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9422583Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9423045Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9423533Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9424024Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9424531Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9424902Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9425393Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9425824Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9426210Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9426580Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9427076Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9427516Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9427973Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9428462Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9428939Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9429470Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9429897Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9430286Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9430650Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9431206Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9431575Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9432056Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9432505Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9433106Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9433403Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9435053Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9435555Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9436479Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9437048Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9437806Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9438379Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9439130Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9439778Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9440294Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9441029Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9441329Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9442089Z E1204 10:19:25.516000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9442197Z ('RERUN', {'yellow': True}) [0.3685s] [ 0%] 2025-12-04T10:35:19.9443282Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9444013Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9444370Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9444736Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9445162Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9445553Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9445999Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9446453Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9446943Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9447432Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9447952Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9448357Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9448791Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9449189Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9449570Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9449952Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9450453Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9450897Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9451351Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9451841Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9452323Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9452844Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9453275Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9453661Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9454028Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9454582Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9454946Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9455448Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9455924Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9456524Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9456827Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9458473Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9458972Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9459903Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9460485Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9461234Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9461812Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9462558Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9463206Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9463727Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9464457Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9464759Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9465517Z E1204 10:19:25.884000 77964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9465625Z FAILED [0.3667s] [ 0%] 2025-12-04T10:35:19.9465630Z 2025-12-04T10:35:19.9465759Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.9466016Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9466275Z Traceback (most recent call last): 2025-12-04T10:35:19.9466607Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9466733Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9467148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9467357Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9467797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9467956Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9468385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9468508Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9468969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9469240Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9469681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9469846Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9470254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9470351Z return self._compile_to_module() 2025-12-04T10:35:19.9470806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9470940Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9471383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9471490Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9471904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9472096Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9472595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9472702Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9473150Z File "/tmp/tmpulvykvmu/oa/coauqkvaipwywfcbw5iluza47wxrwaoxbco5tvf7uqjyyv5ziqiz.py", line 113, in 2025-12-04T10:35:19.9473540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9473625Z kernel.precompile( 2025-12-04T10:35:19.9474098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9474192Z self._precompile_worker() 2025-12-04T10:35:19.9474697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9474841Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9475345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9475516Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9475995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9476274Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9476875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9477224Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9477553Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9477942Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9478014Z ^ 2025-12-04T10:35:19.9478434Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9478439Z 2025-12-04T10:35:19.9479087Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9479096Z 2025-12-04T10:35:19.9479100Z 2025-12-04T10:35:19.9479294Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9479992Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9479996Z 2025-12-04T10:35:19.9480241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9480440Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9480530Z frames [('total', 1)] 2025-12-04T10:35:19.9480684Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9480880Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9481066Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9481216Z graph_break [] 2025-12-04T10:35:19.9481460Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9481562Z Traceback (most recent call last): 2025-12-04T10:35:19.9481903Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9482028Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9482439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9482643Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9483074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9483238Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9483666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9483789Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9484240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9484516Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9484958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9485078Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9485512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9485635Z return self._compile_to_module() 2025-12-04T10:35:19.9486039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9486176Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9486612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9486717Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9487215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9487409Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9487910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9488016Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9488460Z File "/tmp/tmphwflrjpa/sw/cswj7egzn2q73olgfhdyzu4eylzehnbazgcsdyqiil4cwohbgutv.py", line 113, in 2025-12-04T10:35:19.9488851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9488946Z kernel.precompile( 2025-12-04T10:35:19.9489414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9489512Z self._precompile_worker() 2025-12-04T10:35:19.9490116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9490270Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9490774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9490936Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9491366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9491567Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9491978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9492257Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9492451Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9492813Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9492884Z ^ 2025-12-04T10:35:19.9493269Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9493283Z 2025-12-04T10:35:19.9493893Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9493898Z 2025-12-04T10:35:19.9493902Z 2025-12-04T10:35:19.9494086Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9494729Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9494734Z 2025-12-04T10:35:19.9494958Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9495140Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9495223Z frames [('total', 1)] 2025-12-04T10:35:19.9495320Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9495540Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9495753Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9495833Z graph_break [] 2025-12-04T10:35:19.9496013Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9496101Z frames [('total', 1)] 2025-12-04T10:35:19.9496194Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9496373Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9496563Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9496645Z graph_break [] 2025-12-04T10:35:19.9496841Z =================================== FAILURES =================================== 2025-12-04T10:35:19.9497085Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9497189Z Traceback (most recent call last): 2025-12-04T10:35:19.9497520Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9497654Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9498062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9498269Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9498706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9498868Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9499358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9499482Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9499932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9500204Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9500686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9500805Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9501250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9501348Z return self._compile_to_module() 2025-12-04T10:35:19.9501769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9501902Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9502334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9502442Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9502859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9503051Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9503548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9503653Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9504095Z File "/tmp/tmpcq6xxgnx/u5/cu5odgkkqj2qt5iku45hojz2nksqecnxb6sqwnwvdt2w4474rj6b.py", line 113, in 2025-12-04T10:35:19.9504487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9504577Z kernel.precompile( 2025-12-04T10:35:19.9505048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9505143Z self._precompile_worker() 2025-12-04T10:35:19.9505649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9505802Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9506304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9506472Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9506847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9507125Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9507502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9507934Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9508133Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9508604Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9508714Z ^ 2025-12-04T10:35:19.9509193Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9509202Z 2025-12-04T10:35:19.9509807Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9509818Z 2025-12-04T10:35:19.9509822Z 2025-12-04T10:35:19.9510004Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9510646Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9510651Z 2025-12-04T10:35:19.9510873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9511144Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9511226Z frames [('total', 1)] 2025-12-04T10:35:19.9511322Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9511588Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9511773Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9511854Z graph_break [] 2025-12-04T10:35:19.9512030Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9512113Z frames [('total', 1)] 2025-12-04T10:35:19.9512208Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9512391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9512584Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9512662Z graph_break [] 2025-12-04T10:35:19.9512837Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9512925Z frames [('total', 1)] 2025-12-04T10:35:19.9513016Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9513197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9513393Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9513469Z graph_break [] 2025-12-04T10:35:19.9514032Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.xml - 2025-12-04T10:35:19.9514174Z =========================== short test summary info ============================ 2025-12-04T10:35:19.9514795Z FAILED [0.3667s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9515163Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9515235Z ^ 2025-12-04T10:35:19.9515672Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9515680Z 2025-12-04T10:35:19.9516284Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9516289Z 2025-12-04T10:35:19.9516293Z 2025-12-04T10:35:19.9516584Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9517229Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9517233Z 2025-12-04T10:35:19.9517456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9517607Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.9517772Z ================== 1 failed, 12 deselected, 2 rerun in 2.72s =================== 2025-12-04T10:35:19.9517852Z Got exit code 1 2025-12-04T10:35:19.9517941Z Retrying single test... 2025-12-04T10:35:19.9518339Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.xml 2025-12-04T10:35:19.9518471Z ============================= test session starts ============================== 2025-12-04T10:35:19.9518768Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.9518854Z cachedir: .pytest_cache 2025-12-04T10:35:19.9519300Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.9519399Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.9519486Z configfile: pytest.ini 2025-12-04T10:35:19.9519944Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.9520200Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.9520775Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9520908Z Running 1 items in this shard 2025-12-04T10:35:19.9520913Z 2025-12-04T10:35:19.9521905Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9522645Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9523013Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9523377Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9523805Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9524189Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9524642Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9525095Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9525633Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9526133Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9526607Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9526971Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9527482Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9527881Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9528261Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9528638Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9529132Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9529570Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9530034Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9530517Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9530995Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9531575Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9531999Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9532427Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9532797Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9533274Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9533636Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9534115Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9534675Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9535276Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9535613Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9537284Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9537742Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9538717Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9539303Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9540057Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9540635Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9541383Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9542044Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9542575Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9543313Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9543665Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9544421Z E1204 10:19:35.784000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9544571Z ('RERUN', {'yellow': True}) [1.9420s] [100%] 2025-12-04T10:35:19.9545565Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9546299Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9546664Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9547025Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9547467Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9547855Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9548306Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9548765Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9549256Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9549754Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9550228Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9550672Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9551115Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9551509Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9551901Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9552272Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9552769Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9553213Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9553670Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9554159Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9554635Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9555208Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9555720Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9556117Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9556493Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9556977Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9557343Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9557834Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9558283Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9558888Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9559192Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9560844Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9561297Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9562300Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9562832Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9563583Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9564164Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9564908Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9565565Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9566083Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9566913Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9567263Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9568062Z E1204 10:19:36.187000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9568173Z ('RERUN', {'yellow': True}) [0.3708s] [100%] 2025-12-04T10:35:19.9569153Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9569891Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9570248Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9570617Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9571042Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9571429Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9571883Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9572333Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9572827Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9573316Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9573780Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9574234Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9574670Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9575070Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9575468Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9575873Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9576376Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9576815Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9577279Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9577771Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9578247Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9578817Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9579361Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9579754Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9580134Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9580610Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9580978Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9581460Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9581912Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9582508Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9582822Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9584467Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9584930Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9585946Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9586477Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9587232Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9587813Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9588571Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9589231Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9589751Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9590485Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9590829Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9591635Z E1204 10:19:36.558000 78177 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9591721Z FAILED [0.3699s] [100%] 2025-12-04T10:35:19.9591730Z 2025-12-04T10:35:19.9591850Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.9592097Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9592199Z Traceback (most recent call last): 2025-12-04T10:35:19.9592535Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9592664Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9593079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9593293Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9593728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9593900Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9594334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9594452Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9594914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9595184Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9595635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9595760Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9596166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9596266Z return self._compile_to_module() 2025-12-04T10:35:19.9596758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9596901Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9597342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9597450Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9597875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9598069Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9598571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9598675Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9599119Z File "/tmp/tmpeers9ivh/dg/cdgnyp7jueorxhm6ynrxygmlo2o76gxsd3mrhkxp3dth2arpjh5u.py", line 113, in 2025-12-04T10:35:19.9599518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9599607Z kernel.precompile( 2025-12-04T10:35:19.9600073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9600173Z self._precompile_worker() 2025-12-04T10:35:19.9600726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9600882Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9601425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9601587Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9601974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9602175Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9602549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9602830Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9603024Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9603388Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9603456Z ^ 2025-12-04T10:35:19.9603846Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9603851Z 2025-12-04T10:35:19.9604462Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9604467Z 2025-12-04T10:35:19.9604470Z 2025-12-04T10:35:19.9604648Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9605298Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9605303Z 2025-12-04T10:35:19.9605528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9605738Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9605828Z frames [('total', 1)] 2025-12-04T10:35:19.9605941Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9606149Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9606336Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9606414Z graph_break [] 2025-12-04T10:35:19.9606770Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9606874Z Traceback (most recent call last): 2025-12-04T10:35:19.9607212Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9607334Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9607900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9608120Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9608557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9608719Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9609155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9609284Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9609736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9610004Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9610438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9610629Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9611035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9611190Z return self._compile_to_module() 2025-12-04T10:35:19.9611598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9611735Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9612181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9612285Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9612699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9612896Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9613393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9613497Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9613933Z File "/tmp/tmphuw91yu4/ew/cew2syydjpk5ch5yn4fvdwwhohx4q5otnd33lfa2qlgqzsm3raae.py", line 113, in 2025-12-04T10:35:19.9614330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9614428Z kernel.precompile( 2025-12-04T10:35:19.9614898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9614991Z self._precompile_worker() 2025-12-04T10:35:19.9615499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9615673Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9616206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9616369Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9616747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9616953Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9617438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9617729Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9617923Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9618284Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9618362Z ^ 2025-12-04T10:35:19.9618752Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9618757Z 2025-12-04T10:35:19.9619409Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9619414Z 2025-12-04T10:35:19.9619418Z 2025-12-04T10:35:19.9619605Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9620246Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9620257Z 2025-12-04T10:35:19.9620480Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9620664Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9620802Z frames [('total', 1)] 2025-12-04T10:35:19.9620897Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9621092Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9621318Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9621402Z graph_break [] 2025-12-04T10:35:19.9621587Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9621667Z frames [('total', 1)] 2025-12-04T10:35:19.9621765Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9621951Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9622141Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9622220Z graph_break [] 2025-12-04T10:35:19.9622348Z =================================== FAILURES =================================== 2025-12-04T10:35:19.9622588Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9622689Z Traceback (most recent call last): 2025-12-04T10:35:19.9623028Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9623153Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9623567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9623775Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9624211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9624372Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9624800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9624923Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9625379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9625652Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9626098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9626219Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9626703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9626812Z return self._compile_to_module() 2025-12-04T10:35:19.9627219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9627355Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9627794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9627905Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9628322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9628519Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9629018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9629123Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9629562Z File "/tmp/tmpvvfj7qf6/er/cerelih2buxbfk4bhpgjxaygp5h5rr6ur2bvva3wflcr7p7hmm2m.py", line 113, in 2025-12-04T10:35:19.9629956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9630043Z kernel.precompile( 2025-12-04T10:35:19.9630557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9630653Z self._precompile_worker() 2025-12-04T10:35:19.9631156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9631352Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9631858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9632022Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9632404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9632606Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9632982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9633268Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9633458Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9633822Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9633889Z ^ 2025-12-04T10:35:19.9634287Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9634296Z 2025-12-04T10:35:19.9634896Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9634901Z 2025-12-04T10:35:19.9634905Z 2025-12-04T10:35:19.9635081Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9635775Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9635780Z 2025-12-04T10:35:19.9636001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9636190Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9636270Z frames [('total', 1)] 2025-12-04T10:35:19.9636362Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9636726Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9636911Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9636992Z graph_break [] 2025-12-04T10:35:19.9637169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9637253Z frames [('total', 1)] 2025-12-04T10:35:19.9637353Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9637537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9637729Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9637812Z graph_break [] 2025-12-04T10:35:19.9637991Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9638075Z frames [('total', 1)] 2025-12-04T10:35:19.9638180Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9638362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9638559Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9638637Z graph_break [] 2025-12-04T10:35:19.9639192Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.xml - 2025-12-04T10:35:19.9639336Z =========================== short test summary info ============================ 2025-12-04T10:35:19.9639959Z FAILED [0.3699s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9640369Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9640511Z ^ 2025-12-04T10:35:19.9640900Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9640905Z 2025-12-04T10:35:19.9641520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9641525Z 2025-12-04T10:35:19.9641530Z 2025-12-04T10:35:19.9641713Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9642357Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9642365Z 2025-12-04T10:35:19.9642585Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9642738Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.9642913Z ================== 1 failed, 187 deselected, 2 rerun in 2.72s ================== 2025-12-04T10:35:19.9642997Z Got exit code 1 2025-12-04T10:35:19.9643087Z Retrying single test... 2025-12-04T10:35:19.9643503Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.xml 2025-12-04T10:35:19.9643637Z ============================= test session starts ============================== 2025-12-04T10:35:19.9643935Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.9644021Z cachedir: .pytest_cache 2025-12-04T10:35:19.9644467Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.9644583Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.9644675Z configfile: pytest.ini 2025-12-04T10:35:19.9645130Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.9645329Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.9646028Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9646134Z Running 1 items in this shard 2025-12-04T10:35:19.9646139Z 2025-12-04T10:35:19.9647122Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9647864Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9648228Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9648604Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9649046Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9649432Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9649893Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9650390Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9650879Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9651420Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9651890Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9652262Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9652698Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9653095Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9653484Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9653858Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9654357Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9654799Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9655254Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9655747Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9656234Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9656770Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9657196Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9657697Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9658096Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9658605Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9659004Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9659540Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9659998Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9660599Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9660899Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9662555Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9663090Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9663983Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9664520Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9665281Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9665908Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9666662Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9667312Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9667825Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9668573Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9668875Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9669714Z E1204 10:19:46.433000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9669821Z ('RERUN', {'yellow': True}) [1.9424s] [100%] 2025-12-04T10:35:19.9670800Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9671543Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9671899Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9672269Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9672699Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9673086Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9673539Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9674040Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9674544Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9675076Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9675550Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9675920Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9676352Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9676762Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9677145Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9677526Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9678023Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9678461Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9682591Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9683082Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9683580Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9684105Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9684639Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9685029Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9685397Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9685925Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9686293Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9686777Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9687224Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9687828Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9688131Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9689777Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9690341Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9691227Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9691758Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9692511Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9693093Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9693839Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9694484Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9695000Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9695737Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9696043Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9696884Z E1204 10:19:46.836000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9696991Z ('RERUN', {'yellow': True}) [0.3698s] [100%] 2025-12-04T10:35:19.9697979Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:19.9698713Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9699136Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9699500Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T10:35:19.9699931Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T10:35:19.9700313Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9700761Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9701297Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9701860Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9702392Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9702891Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9703284Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:19.9703755Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9704180Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9704592Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9704991Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9705551Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:19.9706048Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9706533Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9707057Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9707572Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:19.9708304Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:19.9708855Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:19.9709273Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9709667Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9710179Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9710574Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9711090Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9711573Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9712218Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9712538Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9714323Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9714917Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9715864Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9716431Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9717242Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9717859Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9718665Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9719370Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9719924Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9720714Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9721037Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9721934Z E1204 10:19:47.206000 78392 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9722016Z FAILED [0.3690s] [100%] 2025-12-04T10:35:19.9722022Z 2025-12-04T10:35:19.9722138Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.9722391Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9722490Z Traceback (most recent call last): 2025-12-04T10:35:19.9722821Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9722951Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9723362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9723576Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9724007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9724168Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9724597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9724758Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9725270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9725605Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9726203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9726352Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9726862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9726982Z return self._compile_to_module() 2025-12-04T10:35:19.9727491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9727657Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9728116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9728219Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9728632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9728827Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9729325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9729426Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9729847Z File "/tmp/tmpi94rswo_/qm/cqmul4ihb2q7mtn4idinpdmrnj3ke5mlqu7zft73jza6ojbzmikj.py", line 113, in 2025-12-04T10:35:19.9730233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9730322Z kernel.precompile( 2025-12-04T10:35:19.9730793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9730885Z self._precompile_worker() 2025-12-04T10:35:19.9731393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9731538Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9732710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9732879Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9733255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9733459Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9733830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9734111Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9734302Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9734659Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9734729Z ^ 2025-12-04T10:35:19.9735154Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9735160Z 2025-12-04T10:35:19.9735919Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9735926Z 2025-12-04T10:35:19.9735930Z 2025-12-04T10:35:19.9736156Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9737008Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9737017Z 2025-12-04T10:35:19.9737335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9737513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9737599Z frames [('total', 1)] 2025-12-04T10:35:19.9737689Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9737887Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9738072Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9738147Z graph_break [] 2025-12-04T10:35:19.9738389Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9738495Z Traceback (most recent call last): 2025-12-04T10:35:19.9738827Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9738950Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9739402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9739610Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9740044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9740206Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9740638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9740756Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9741205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9741479Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9741915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9742037Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9742442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9742625Z return self._compile_to_module() 2025-12-04T10:35:19.9743037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9743174Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9743605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9743709Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9744127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9744319Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9744824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9744926Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9745359Z File "/tmp/tmplyp9z0_d/qz/cqzpqb5g2dn76mrsrliltewzmsmd63hczhwscljqn3opivcxpppp.py", line 113, in 2025-12-04T10:35:19.9745800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9745885Z kernel.precompile( 2025-12-04T10:35:19.9746356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9746494Z self._precompile_worker() 2025-12-04T10:35:19.9747000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9747186Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9747687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9747854Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9748239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9748444Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9748813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9749091Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9749283Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9749641Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9749712Z ^ 2025-12-04T10:35:19.9750099Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9750104Z 2025-12-04T10:35:19.9750714Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9750719Z 2025-12-04T10:35:19.9750723Z 2025-12-04T10:35:19.9750908Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9751543Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9751550Z 2025-12-04T10:35:19.9751773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9751948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9752030Z frames [('total', 1)] 2025-12-04T10:35:19.9752123Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9752316Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9752498Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9752658Z graph_break [] 2025-12-04T10:35:19.9752833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9752914Z frames [('total', 1)] 2025-12-04T10:35:19.9753003Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9753180Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9753370Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9753449Z graph_break [] 2025-12-04T10:35:19.9753563Z =================================== FAILURES =================================== 2025-12-04T10:35:19.9753805Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T10:35:19.9753903Z Traceback (most recent call last): 2025-12-04T10:35:19.9754235Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9754361Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9754777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9754985Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9755418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9755574Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9756056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9756172Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9756673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9756940Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9757380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9757499Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9757901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9757997Z return self._compile_to_module() 2025-12-04T10:35:19.9758409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9758539Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9758981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9759096Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9759637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9759913Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9760553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9760691Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9761125Z File "/tmp/tmpuwd5n7ww/c2/cc2if36af3aygo6zlipqsg5nkk7qdid33txk5tpovfoztw5djvu6.py", line 113, in 2025-12-04T10:35:19.9761522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9761611Z kernel.precompile( 2025-12-04T10:35:19.9762082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9762172Z self._precompile_worker() 2025-12-04T10:35:19.9762779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9762931Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9763437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9763601Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9763977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9764184Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9764553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9764837Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9765025Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9765387Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9765457Z ^ 2025-12-04T10:35:19.9765892Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9765896Z 2025-12-04T10:35:19.9766502Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9766552Z 2025-12-04T10:35:19.9766556Z 2025-12-04T10:35:19.9766734Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9767407Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9767415Z 2025-12-04T10:35:19.9767637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9767817Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9767901Z frames [('total', 1)] 2025-12-04T10:35:19.9767992Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9768185Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9768372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9768452Z graph_break [] 2025-12-04T10:35:19.9768624Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9768707Z frames [('total', 1)] 2025-12-04T10:35:19.9768800Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9768982Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9769171Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9769247Z graph_break [] 2025-12-04T10:35:19.9769424Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9769510Z frames [('total', 1)] 2025-12-04T10:35:19.9769599Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9769779Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9769967Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:19.9770042Z graph_break [] 2025-12-04T10:35:19.9770599Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.xml - 2025-12-04T10:35:19.9770740Z =========================== short test summary info ============================ 2025-12-04T10:35:19.9771363Z FAILED [0.3690s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9771724Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:19.9771792Z ^ 2025-12-04T10:35:19.9772289Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9772294Z 2025-12-04T10:35:19.9772895Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9772900Z 2025-12-04T10:35:19.9772906Z 2025-12-04T10:35:19.9773086Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9773830Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9773838Z 2025-12-04T10:35:19.9774062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9774207Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.9774375Z ================== 1 failed, 187 deselected, 2 rerun in 2.72s ================== 2025-12-04T10:35:19.9774457Z Got exit code 1 2025-12-04T10:35:19.9774882Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T10:35:19.9775234Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:19.9775627Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.xml 2025-12-04T10:35:19.9775806Z ============================= test session starts ============================== 2025-12-04T10:35:19.9776099Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.9776228Z cachedir: .pytest_cache 2025-12-04T10:35:19.9776672Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.9776779Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.9776865Z configfile: pytest.ini 2025-12-04T10:35:19.9777323Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.9777510Z collecting ... collected 188 items / 13 deselected / 175 selected 2025-12-04T10:35:19.9777625Z stepcurrent: skipping 13 already run items. 2025-12-04T10:35:19.9777719Z Running 175 items in this shard 2025-12-04T10:35:19.9777724Z 2025-12-04T10:35:19.9778710Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9779599Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9779957Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9780330Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.9780717Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9781171Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9781628Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9782119Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9782691Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9783163Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9783536Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.9784070Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.9784560Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.9785006Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.9785451Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9785858Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9786259Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9786689Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9787329Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.9787799Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9788293Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9788773Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.9789237Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.9789717Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.9790155Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9790605Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9791032Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:19.9791418Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9791784Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9792260Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9792625Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9793108Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9793633Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9794232Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9794528Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9796390Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9796840Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9797722Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9798289Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9799079Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9799656Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9800396Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9801045Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9801559Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9802385Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9802688Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9803442Z E1204 10:19:56.806000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9803550Z ('RERUN', {'yellow': True}) [1.6982s] [ 0%] 2025-12-04T10:35:19.9804524Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9805349Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9805783Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9806158Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.9806541Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9806991Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9807444Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9808118Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9808614Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9809076Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9809447Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.9810053Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.9810538Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.9811144Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.9811589Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9812002Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9812400Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9812785Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9813424Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.9813859Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9814355Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9814831Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.9815291Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.9815772Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.9816207Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9816662Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9817194Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:19.9817582Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9817947Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9818420Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9818789Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9819345Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9819795Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9820390Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9820684Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9822536Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9823071Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9823956Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9824483Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9825234Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9825809Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9826552Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9827197Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9827712Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9828533Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9828913Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9829677Z E1204 10:19:57.093000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9829785Z ('RERUN', {'yellow': True}) [0.2539s] [ 0%] 2025-12-04T10:35:19.9830760Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9831587Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9831950Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9832327Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.9832708Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9833157Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9833654Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9834180Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9834678Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9835147Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9835529Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.9836056Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.9836543Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.9837002Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.9837448Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9837860Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9838262Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9838653Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9839291Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.9839726Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9840301Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9840780Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.9841247Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.9841724Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.9842163Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9842620Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9843044Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:19.9843434Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9843798Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9844270Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9844682Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9845160Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9845680Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9846304Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9846600Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9848453Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9848918Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9849797Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9850323Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9851085Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9851661Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9852486Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9853136Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9853655Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9854682Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9854999Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9855806Z E1204 10:19:57.347000 78607 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9855895Z FAILED [0.2518s] [ 0%] 2025-12-04T10:35:19.9855900Z 2025-12-04T10:35:19.9856020Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.9856343Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:19.9856445Z Traceback (most recent call last): 2025-12-04T10:35:19.9856781Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9856948Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9857364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9857577Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9858015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9858176Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9858607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9858726Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9859229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9859502Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9859945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9860063Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9860471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9860576Z return self._compile_to_module() 2025-12-04T10:35:19.9860983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9861117Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9861553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9861657Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9862079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9862271Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9862851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9862954Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9863370Z File "/tmp/tmpii_aqm5m/dm/cdmq4hscslc3dxrhyn4irizq3gehd4b6o2o37xojoqw45umw3dlc.py", line 58, in 2025-12-04T10:35:19.9863761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9863851Z kernel.precompile( 2025-12-04T10:35:19.9864318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9864417Z self._precompile_worker() 2025-12-04T10:35:19.9864924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9865074Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9865615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9865793Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9866176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9866378Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9866796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9867078Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9867307Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9867755Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9867827Z ^ 2025-12-04T10:35:19.9868217Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9868222Z 2025-12-04T10:35:19.9868833Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9868838Z 2025-12-04T10:35:19.9868842Z 2025-12-04T10:35:19.9869018Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9869665Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.9869672Z 2025-12-04T10:35:19.9869892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9870074Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9870155Z frames [('total', 1)] 2025-12-04T10:35:19.9870253Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9870457Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9870640Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9870716Z graph_break [] 2025-12-04T10:35:19.9870964Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:19.9871062Z Traceback (most recent call last): 2025-12-04T10:35:19.9871405Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9871534Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9871941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9872152Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9872671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9872829Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9873262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9873381Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9873839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9874107Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9874545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9874672Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9875078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9875183Z return self._compile_to_module() 2025-12-04T10:35:19.9875635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9875780Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9876215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9876361Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9876776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9877015Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9877512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9877623Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9878068Z File "/tmp/tmpswodzstu/km/ckmuvog2sm7j37zwknidtfsvo2apzyznlu6sudtjbnnfbedyv6ef.py", line 58, in 2025-12-04T10:35:19.9878457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9878549Z kernel.precompile( 2025-12-04T10:35:19.9879018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9879119Z self._precompile_worker() 2025-12-04T10:35:19.9879623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9879775Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9880285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9880451Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9880833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9881038Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9881406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9881692Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9881881Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9882328Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9882406Z ^ 2025-12-04T10:35:19.9882791Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9882795Z 2025-12-04T10:35:19.9883482Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9883487Z 2025-12-04T10:35:19.9883491Z 2025-12-04T10:35:19.9883669Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9884308Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.9884316Z 2025-12-04T10:35:19.9884534Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9884713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9884798Z frames [('total', 1)] 2025-12-04T10:35:19.9884890Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9885085Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9885282Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9885365Z graph_break [] 2025-12-04T10:35:19.9885570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9885669Z frames [('total', 1)] 2025-12-04T10:35:19.9885765Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9885949Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9886186Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9886261Z graph_break [] 2025-12-04T10:35:19.9886386Z =================================== FAILURES =================================== 2025-12-04T10:35:19.9886674Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:19.9886774Z Traceback (most recent call last): 2025-12-04T10:35:19.9887109Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9887242Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9887655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9887861Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9888292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9888454Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9888881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9889006Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9889452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9889726Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9890177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9890294Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9890694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9890800Z return self._compile_to_module() 2025-12-04T10:35:19.9891205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9891341Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9891778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9891882Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9892404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9892598Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9893106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9893206Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9893639Z File "/tmp/tmpeepljok7/2j/c2jg7xzpv7phtsa45bia2pdg4bfryj76begqwidhfobbf3bkzz7x.py", line 58, in 2025-12-04T10:35:19.9894036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9894127Z kernel.precompile( 2025-12-04T10:35:19.9894593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:19.9894693Z self._precompile_worker() 2025-12-04T10:35:19.9895204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:19.9895351Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:19.9895854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9896016Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9896440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9896641Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9897052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9897334Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9897535Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9897985Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9898050Z ^ 2025-12-04T10:35:19.9898439Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9898448Z 2025-12-04T10:35:19.9899115Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9899123Z 2025-12-04T10:35:19.9899127Z 2025-12-04T10:35:19.9899310Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9899947Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.9899951Z 2025-12-04T10:35:19.9900181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9900366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9900446Z frames [('total', 1)] 2025-12-04T10:35:19.9900540Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9900740Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9900922Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9900999Z graph_break [] 2025-12-04T10:35:19.9901176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9901255Z frames [('total', 1)] 2025-12-04T10:35:19.9901350Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9901534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9901726Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9901808Z graph_break [] 2025-12-04T10:35:19.9902060Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:19.9902141Z frames [('total', 1)] 2025-12-04T10:35:19.9902240Z stats [('calls_captured', 6)] 2025-12-04T10:35:19.9902417Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:19.9902608Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:19.9902694Z graph_break [] 2025-12-04T10:35:19.9903247Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.xml - 2025-12-04T10:35:19.9903388Z =========================== short test summary info ============================ 2025-12-04T10:35:19.9904011Z FAILED [0.2518s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:19.9904459Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9904530Z ^ 2025-12-04T10:35:19.9904916Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9904920Z 2025-12-04T10:35:19.9905572Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:19.9905624Z 2025-12-04T10:35:19.9905627Z 2025-12-04T10:35:19.9905805Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:19.9906444Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.9906489Z 2025-12-04T10:35:19.9906710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:19.9906862Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:19.9907029Z ================== 1 failed, 13 deselected, 2 rerun in 2.24s =================== 2025-12-04T10:35:19.9907104Z Got exit code 1 2025-12-04T10:35:19.9907188Z Retrying single test... 2025-12-04T10:35:19.9907590Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.xml 2025-12-04T10:35:19.9907722Z ============================= test session starts ============================== 2025-12-04T10:35:19.9908182Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:19.9908270Z cachedir: .pytest_cache 2025-12-04T10:35:19.9908715Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:19.9908821Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:19.9908909Z configfile: pytest.ini 2025-12-04T10:35:19.9909373Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:19.9909560Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:19.9910125Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:19.9910224Z Running 1 items in this shard 2025-12-04T10:35:19.9910228Z 2025-12-04T10:35:19.9911206Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9912167Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9912529Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9912903Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.9913289Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9913739Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9914195Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9914684Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9915179Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9915705Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9916077Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.9916677Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.9917162Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.9917665Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.9918111Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9918519Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9918920Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9919309Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9919951Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.9920384Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9920880Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9921365Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.9921829Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.9922312Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.9922749Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9923199Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9923715Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:19.9924104Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9924473Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9924948Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9925312Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9925849Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9926296Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9926899Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9927195Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9929100Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9929602Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9930491Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9931022Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9931775Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9932351Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9933094Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9933749Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9934267Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9935095Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9935505Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9936259Z E1204 10:20:07.435000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9936370Z ('RERUN', {'yellow': True}) [1.6737s] [100%] 2025-12-04T10:35:19.9937343Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9938165Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9938526Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9938902Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.9939326Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9939821Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9940279Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9940811Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9941317Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9941782Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9942159Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.9942690Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.9943179Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.9943629Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.9944075Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9944490Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9944889Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9945275Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9945920Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.9946352Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9946927Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9947407Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.9947870Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.9948356Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.9948793Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9949255Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9949679Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:19.9950067Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9950436Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9950910Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9951323Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9951841Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9952291Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9952887Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9953185Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9955041Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9955535Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9956432Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9956969Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9957721Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9958372Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9959120Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9959772Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9960286Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9961111Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9961414Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9962169Z E1204 10:20:07.720000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9962276Z ('RERUN', {'yellow': True}) [0.2523s] [100%] 2025-12-04T10:35:19.9963247Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:19.9964112Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9964509Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:19.9964887Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:19.9965271Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:19.9965773Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:19.9966226Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:19.9966715Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:19.9967203Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:19.9967670Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:19.9968045Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:19.9968569Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:19.9969056Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:19.9969506Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:19.9970032Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:19.9970444Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:19.9970843Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:19.9971231Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:19.9971870Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:19.9972302Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:19.9976450Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:19.9976954Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:19.9977423Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:19.9977902Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:19.9978402Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:19.9978927Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:19.9979438Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:19.9979830Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:19.9980198Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:19.9980675Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:19.9981046Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:19.9981525Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:19.9981980Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:19.9982582Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:19.9982881Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:19.9984747Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:19.9985289Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:19.9986174Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:19.9986703Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:19.9987463Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:19.9988036Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:19.9988784Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:19.9989435Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:19.9990071Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:19.9990894Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:19.9991235Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:19.9991997Z E1204 10:20:07.972000 78788 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:19.9992081Z FAILED [0.2512s] [100%] 2025-12-04T10:35:19.9992087Z 2025-12-04T10:35:19.9992206Z ==================================== RERUNS ==================================== 2025-12-04T10:35:19.9992452Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:19.9992554Z Traceback (most recent call last): 2025-12-04T10:35:19.9992888Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:19.9993018Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:19.9993430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:19.9993642Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:19.9994076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:19.9994235Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:19.9994665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:19.9994786Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:19.9995243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:19.9995511Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:19.9995956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:19.9996074Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:19.9996558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:19.9996658Z return self._compile_to_module() 2025-12-04T10:35:19.9997062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:19.9997198Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:19.9997633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:19.9997738Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:19.9998154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:19.9998347Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:19.9998847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:19.9998952Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:19.9999364Z File "/tmp/tmpeo1z9_ac/ki/ckiqfrau675jrcajjq225jzsvpaepza2evcxwh7p7veeyxvro6bx.py", line 58, in 2025-12-04T10:35:19.9999757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:19.9999843Z kernel.precompile( 2025-12-04T10:35:20.0000357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0000457Z self._precompile_worker() 2025-12-04T10:35:20.0001000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0001150Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0001657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0001819Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0002198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0002398Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0002770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0003052Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0003241Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0003691Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0003759Z ^ 2025-12-04T10:35:20.0004148Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0004153Z 2025-12-04T10:35:20.0004758Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0004763Z 2025-12-04T10:35:20.0004768Z 2025-12-04T10:35:20.0004944Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0005587Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0005594Z 2025-12-04T10:35:20.0005815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0005995Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0006079Z frames [('total', 1)] 2025-12-04T10:35:20.0006251Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0006455Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0006637Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0006713Z graph_break [] 2025-12-04T10:35:20.0006958Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:20.0007053Z Traceback (most recent call last): 2025-12-04T10:35:20.0007387Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0007515Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0008084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0008298Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0008734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0008891Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0009323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0009439Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0009890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0010233Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0010670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0010846Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0011246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0011349Z return self._compile_to_module() 2025-12-04T10:35:20.0011755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0011885Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0012320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0012426Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0012841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0013031Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0013527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0013629Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0014057Z File "/tmp/tmprza68jrp/5r/c5rfu6b2g2c7rswcp6uwh5bi6rtxldexbdcwxcfypb53wsc5i2mk.py", line 58, in 2025-12-04T10:35:20.0014448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0014537Z kernel.precompile( 2025-12-04T10:35:20.0015003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0015101Z self._precompile_worker() 2025-12-04T10:35:20.0015604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0015752Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0016260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0016423Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0016909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0017116Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0017485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0017767Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0017960Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0018405Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0018477Z ^ 2025-12-04T10:35:20.0018860Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0018865Z 2025-12-04T10:35:20.0019521Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0019527Z 2025-12-04T10:35:20.0019531Z 2025-12-04T10:35:20.0019708Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0020338Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0020421Z 2025-12-04T10:35:20.0020642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0020857Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0020942Z frames [('total', 1)] 2025-12-04T10:35:20.0021032Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0021228Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0021422Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0021502Z graph_break [] 2025-12-04T10:35:20.0021681Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0021760Z frames [('total', 1)] 2025-12-04T10:35:20.0021850Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0022030Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0022227Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0022301Z graph_break [] 2025-12-04T10:35:20.0022419Z =================================== FAILURES =================================== 2025-12-04T10:35:20.0022662Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:20.0022761Z Traceback (most recent call last): 2025-12-04T10:35:20.0023094Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0023221Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0023632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0023835Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0024265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0024433Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0024861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0024983Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0025435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0025732Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0026423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0026599Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0027121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0027260Z return self._compile_to_module() 2025-12-04T10:35:20.0027815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0028004Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0028495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0028599Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0029021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0029211Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0029708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0029809Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0030243Z File "/tmp/tmpl5qag5at/2n/c2neobpztvrnquo7jjy4vsoucdfkatytianoxd34445csppfjpoc.py", line 58, in 2025-12-04T10:35:20.0030693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0030820Z kernel.precompile( 2025-12-04T10:35:20.0031289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0031383Z self._precompile_worker() 2025-12-04T10:35:20.0031890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0032038Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0032539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0032700Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0033083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0033283Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0033656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0033935Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0034123Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0034577Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0034646Z ^ 2025-12-04T10:35:20.0035031Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0035038Z 2025-12-04T10:35:20.0035638Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0035645Z 2025-12-04T10:35:20.0035649Z 2025-12-04T10:35:20.0035826Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0036466Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0036472Z 2025-12-04T10:35:20.0036770Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0036951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0037035Z frames [('total', 1)] 2025-12-04T10:35:20.0037127Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0037328Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0037510Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0037591Z graph_break [] 2025-12-04T10:35:20.0037768Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0037846Z frames [('total', 1)] 2025-12-04T10:35:20.0037949Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0038127Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0038319Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0038395Z graph_break [] 2025-12-04T10:35:20.0038571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0038650Z frames [('total', 1)] 2025-12-04T10:35:20.0038742Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0038917Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0039107Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0039187Z graph_break [] 2025-12-04T10:35:20.0039787Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.xml - 2025-12-04T10:35:20.0039927Z =========================== short test summary info ============================ 2025-12-04T10:35:20.0040592Z FAILED [0.2512s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0041045Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0041113Z ^ 2025-12-04T10:35:20.0041499Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0041503Z 2025-12-04T10:35:20.0042105Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0042113Z 2025-12-04T10:35:20.0042117Z 2025-12-04T10:35:20.0042294Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0042928Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0042936Z 2025-12-04T10:35:20.0043156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0043307Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.0043473Z ================== 1 failed, 187 deselected, 2 rerun in 2.21s ================== 2025-12-04T10:35:20.0043548Z Got exit code 1 2025-12-04T10:35:20.0043631Z Retrying single test... 2025-12-04T10:35:20.0044033Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.xml 2025-12-04T10:35:20.0044166Z ============================= test session starts ============================== 2025-12-04T10:35:20.0044456Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.0044542Z cachedir: .pytest_cache 2025-12-04T10:35:20.0044986Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.0045087Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.0045171Z configfile: pytest.ini 2025-12-04T10:35:20.0045710Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.0045899Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.0046460Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0046556Z Running 1 items in this shard 2025-12-04T10:35:20.0046563Z 2025-12-04T10:35:20.0047543Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:20.0048375Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0048733Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0049108Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.0049493Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0049987Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0050441Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0050970Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0051463Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0051932Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0052305Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.0052839Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.0053324Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.0053768Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.0054216Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0054625Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0055028Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0055418Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0056059Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.0056491Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:20.0057084Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0057568Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:20.0058031Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:20.0058513Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:20.0058950Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0059475Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0059905Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:20.0060292Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0060657Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0061174Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0061536Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0062057Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0062512Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0063111Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0063407Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0065271Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0065726Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0066610Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0067139Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0067891Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0068540Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0069283Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0069938Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0070454Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0071280Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0071584Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0072339Z E1204 10:20:18.064000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0072448Z ('RERUN', {'yellow': True}) [1.6776s] [100%] 2025-12-04T10:35:20.0073462Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:20.0074490Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0074968Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0075396Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.0075779Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0076228Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0076687Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0077178Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0077674Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0078138Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0078511Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.0079045Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.0079534Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.0079988Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.0080527Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0080940Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0081344Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0081732Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0082372Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.0082806Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:20.0083306Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0083785Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:20.0084251Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:20.0084774Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:20.0085210Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0085755Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0086183Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:20.0086568Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0086937Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0087413Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0087782Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0088365Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0088813Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0089423Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0089724Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0091587Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0092126Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0093013Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0093543Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0094298Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0094877Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0095621Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0096275Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0096830Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0097658Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0098008Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0098767Z E1204 10:20:18.349000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0098872Z ('RERUN', {'yellow': True}) [0.2524s] [100%] 2025-12-04T10:35:20.0099897Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T10:35:20.0100724Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0101086Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0101460Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.0101843Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0102292Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0102746Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0103235Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0103727Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0104301Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0104680Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.0105208Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.0105746Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.0106194Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.0106635Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0107051Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0107451Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0107982Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0108698Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.0109203Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T10:35:20.0109704Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0110182Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T10:35:20.0110647Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T10:35:20.0111124Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T10:35:20.0111562Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0112018Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0112440Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T10:35:20.0112835Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0113199Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0113672Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0114039Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0114517Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0114968Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0115722Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0116021Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0117876Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0118336Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0119219Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0119748Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0120545Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0121169Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0121924Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0122580Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0123105Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0123937Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0124240Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0125000Z E1204 10:20:18.601000 78969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0125080Z FAILED [0.2500s] [100%] 2025-12-04T10:35:20.0125085Z 2025-12-04T10:35:20.0125208Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.0125450Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:20.0125553Z Traceback (most recent call last): 2025-12-04T10:35:20.0125891Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0126020Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0126443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0126650Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0127167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0127332Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0127764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0127886Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0128341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0128610Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0129069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0129191Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0129594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0129692Z return self._compile_to_module() 2025-12-04T10:35:20.0130100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0130234Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0130716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0130821Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0131244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0131479Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0131977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0132086Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0132514Z File "/tmp/tmpsdp16qvd/oq/coqvl7e4avnrb4webtk7gnbgy4jwbaj35i6key6dw7uioiq6dn35.py", line 58, in 2025-12-04T10:35:20.0132911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0132996Z kernel.precompile( 2025-12-04T10:35:20.0133469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0133567Z self._precompile_worker() 2025-12-04T10:35:20.0134075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0134228Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0134733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0134899Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0135282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0135483Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0135856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0136143Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0136333Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0136783Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0136852Z ^ 2025-12-04T10:35:20.0137323Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0137328Z 2025-12-04T10:35:20.0137932Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0137937Z 2025-12-04T10:35:20.0137941Z 2025-12-04T10:35:20.0138119Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0138764Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0138771Z 2025-12-04T10:35:20.0138991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0139215Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0139303Z frames [('total', 1)] 2025-12-04T10:35:20.0139398Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0139611Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0139794Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0139870Z graph_break [] 2025-12-04T10:35:20.0140116Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:20.0140213Z Traceback (most recent call last): 2025-12-04T10:35:20.0140588Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0140720Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0141128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0141404Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0141835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0141996Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0142431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0142545Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0142996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0143268Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0143705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0143834Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0144235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0144340Z return self._compile_to_module() 2025-12-04T10:35:20.0144750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0144881Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0145320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0145427Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0145842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0146034Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0146535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0146640Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0147162Z File "/tmp/tmpc12taikz/av/cavvtprixkkgxxwzzwpwqn4efewmgqhtskoya27wplgesdx3fcwk.py", line 58, in 2025-12-04T10:35:20.0147557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0147649Z kernel.precompile( 2025-12-04T10:35:20.0148115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0148218Z self._precompile_worker() 2025-12-04T10:35:20.0148723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0148870Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0149375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0149536Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0149918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0150121Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0150489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0150771Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0151003Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0151452Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0151562Z ^ 2025-12-04T10:35:20.0151952Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0151957Z 2025-12-04T10:35:20.0152570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0152575Z 2025-12-04T10:35:20.0152578Z 2025-12-04T10:35:20.0152756Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0153388Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0153402Z 2025-12-04T10:35:20.0153624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0153805Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0153893Z frames [('total', 1)] 2025-12-04T10:35:20.0153983Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0154183Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0154381Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0154459Z graph_break [] 2025-12-04T10:35:20.0154632Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0154713Z frames [('total', 1)] 2025-12-04T10:35:20.0154803Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0154986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0155186Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0155263Z graph_break [] 2025-12-04T10:35:20.0155392Z =================================== FAILURES =================================== 2025-12-04T10:35:20.0155637Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T10:35:20.0155735Z Traceback (most recent call last): 2025-12-04T10:35:20.0156068Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0156360Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0156777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0156985Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0157423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0157588Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0158020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0158141Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0158596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0158868Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0159314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0159431Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0159832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0159931Z return self._compile_to_module() 2025-12-04T10:35:20.0160380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0160516Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0160996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0161105Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0161530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0161719Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0162218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0162320Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0162750Z File "/tmp/tmpv11tio94/jb/cjbvpmlg5e7xcmkzlsydiijqgtcyl4kztxx55xndnk4zpwgtn6x5.py", line 58, in 2025-12-04T10:35:20.0163145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0163230Z kernel.precompile( 2025-12-04T10:35:20.0163699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0163795Z self._precompile_worker() 2025-12-04T10:35:20.0164301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0164450Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0164952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0165113Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0165498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0165697Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0166071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0166359Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0166545Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0167078Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0167149Z ^ 2025-12-04T10:35:20.0167535Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0167547Z 2025-12-04T10:35:20.0168149Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0168156Z 2025-12-04T10:35:20.0168160Z 2025-12-04T10:35:20.0168335Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0168971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0168976Z 2025-12-04T10:35:20.0169282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0169465Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0169544Z frames [('total', 1)] 2025-12-04T10:35:20.0169635Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0169836Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0170017Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0170140Z graph_break [] 2025-12-04T10:35:20.0170319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0170396Z frames [('total', 1)] 2025-12-04T10:35:20.0170529Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0170707Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0170902Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0170979Z graph_break [] 2025-12-04T10:35:20.0171159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0171239Z frames [('total', 1)] 2025-12-04T10:35:20.0171331Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0171507Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0171696Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.0171777Z graph_break [] 2025-12-04T10:35:20.0172334Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.xml - 2025-12-04T10:35:20.0172474Z =========================== short test summary info ============================ 2025-12-04T10:35:20.0173088Z FAILED [0.2500s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0173539Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.0173608Z ^ 2025-12-04T10:35:20.0174001Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0174005Z 2025-12-04T10:35:20.0174613Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0174620Z 2025-12-04T10:35:20.0174624Z 2025-12-04T10:35:20.0174801Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0175433Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0175440Z 2025-12-04T10:35:20.0175663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0175887Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.0176057Z ================== 1 failed, 187 deselected, 2 rerun in 2.21s ================== 2025-12-04T10:35:20.0176134Z Got exit code 1 2025-12-04T10:35:20.0176557Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T10:35:20.0176911Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.0177307Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.xml 2025-12-04T10:35:20.0177448Z ============================= test session starts ============================== 2025-12-04T10:35:20.0177738Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.0177830Z cachedir: .pytest_cache 2025-12-04T10:35:20.0178282Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.0178381Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.0178470Z configfile: pytest.ini 2025-12-04T10:35:20.0178932Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.0179168Z collecting ... collected 188 items / 14 deselected / 174 selected 2025-12-04T10:35:20.0179359Z stepcurrent: skipping 14 already run items. 2025-12-04T10:35:20.0179451Z Running 174 items in this shard 2025-12-04T10:35:20.0179456Z 2025-12-04T10:35:20.0180452Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0181236Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0181594Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0181967Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0182402Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0182794Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0183245Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0183702Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0184203Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0184695Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0185164Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0185584Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0186024Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0186419Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0186882Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0187258Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0187754Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0188192Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0188647Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0189136Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0189621Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0190146Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0190573Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0191003Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0191368Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0191893Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0192263Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0192748Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0193192Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0193790Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0194093Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0195884Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0196338Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0197222Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0197755Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0198587Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0199170Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0199915Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0200569Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0201088Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0201827Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0202133Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0202887Z E1204 10:20:28.949000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0203043Z ('RERUN', {'yellow': True}) [1.9725s] [ 0%] 2025-12-04T10:35:20.0204069Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0204805Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0205164Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0205531Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0206026Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0206407Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0206859Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0207317Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0207962Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0208462Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0208933Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0209297Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0209736Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0210251Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0210639Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0211008Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0211502Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0211952Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0212406Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0212904Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0213383Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0213909Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0214393Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0214785Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0215213Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0215744Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0216116Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0216595Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0217041Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0217646Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0217950Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0219734Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0220192Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0221079Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0221723Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0222479Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0223049Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0223796Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0224454Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0224975Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0225715Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0226015Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0226815Z E1204 10:20:29.369000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0226959Z ('RERUN', {'yellow': True}) [0.3892s] [ 0%] 2025-12-04T10:35:20.0227952Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0228685Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0229041Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0229414Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0229849Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0230236Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0230692Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0231147Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0231637Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0232129Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0232609Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0232978Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0233489Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0233889Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0234268Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0234641Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0235139Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0235578Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0236035Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0236525Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0237007Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0237530Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0238010Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0238400Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0238807Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0239288Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0239656Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0240133Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0240587Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0241188Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0241495Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0243221Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0243677Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0244561Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0245175Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0245928Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0246500Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0247249Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0247906Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0248424Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0249155Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0249509Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0250265Z E1204 10:20:29.757000 79150 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0250385Z FAILED [0.3863s] [ 0%] 2025-12-04T10:35:20.0250389Z 2025-12-04T10:35:20.0250508Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.0250761Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0250862Z Traceback (most recent call last): 2025-12-04T10:35:20.0251193Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0251320Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0251732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0251942Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0252374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0252535Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0252967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0253091Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0253542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0253811Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0254256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0254378Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0254789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0254891Z return self._compile_to_module() 2025-12-04T10:35:20.0255298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0255434Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0255952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0256058Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0256482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0256673Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0257178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0257279Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0257717Z File "/tmp/tmp3r1dizft/dw/cdwt2lywtnk5z527vsp3g7wsnxlvgovzbfo7fyv6ykrzehhwupqk.py", line 118, in 2025-12-04T10:35:20.0261965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0262066Z kernel.precompile( 2025-12-04T10:35:20.0262560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0262659Z self._precompile_worker() 2025-12-04T10:35:20.0263166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0263317Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0263891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0264060Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0264481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0264684Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0265066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0265348Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0265548Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0265918Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0265988Z ^ 2025-12-04T10:35:20.0266375Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0266381Z 2025-12-04T10:35:20.0266986Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0266992Z 2025-12-04T10:35:20.0266995Z 2025-12-04T10:35:20.0267174Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0267829Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0267836Z 2025-12-04T10:35:20.0268057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0268241Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0268327Z frames [('total', 1)] 2025-12-04T10:35:20.0268418Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0268616Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0268800Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0268884Z graph_break [] 2025-12-04T10:35:20.0269127Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0269233Z Traceback (most recent call last): 2025-12-04T10:35:20.0269678Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0269807Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0270216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0270424Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0270859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0271024Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0271451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0271572Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0272028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0272297Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0272740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0272858Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0273260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0273405Z return self._compile_to_module() 2025-12-04T10:35:20.0273810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0273986Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0274426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0274536Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0274954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0275144Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0275665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0275796Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0276215Z File "/tmp/tmp11_fddqf/ga/cga3hgv4qpzsymnco5mighyf4awpn5cxvjoxb5wf3wn7cpoaxeb3.py", line 118, in 2025-12-04T10:35:20.0276613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0276707Z kernel.precompile( 2025-12-04T10:35:20.0277175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0277274Z self._precompile_worker() 2025-12-04T10:35:20.0277776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0277920Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0278424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0278591Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0278970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0279173Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0279541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0279905Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0280097Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0280459Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0280529Z ^ 2025-12-04T10:35:20.0280912Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0280919Z 2025-12-04T10:35:20.0281526Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0281533Z 2025-12-04T10:35:20.0281537Z 2025-12-04T10:35:20.0281714Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0282368Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0282373Z 2025-12-04T10:35:20.0282594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0282770Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0282859Z frames [('total', 1)] 2025-12-04T10:35:20.0282953Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0283149Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0283376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0283454Z graph_break [] 2025-12-04T10:35:20.0283630Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0283778Z frames [('total', 1)] 2025-12-04T10:35:20.0283870Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0284052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0284248Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0284328Z graph_break [] 2025-12-04T10:35:20.0284446Z =================================== FAILURES =================================== 2025-12-04T10:35:20.0284688Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0284787Z Traceback (most recent call last): 2025-12-04T10:35:20.0285119Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0285246Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0285661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0285868Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0286301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0286464Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0286892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0287010Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0287457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0287727Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0288165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0288286Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0288689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0288786Z return self._compile_to_module() 2025-12-04T10:35:20.0289280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0289420Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0289852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0289959Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0290380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0290571Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0291071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0291173Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0291611Z File "/tmp/tmpuq2q77kl/nb/cnbfqdj55nl5pli74rjeyhi3zqxsld57w6qxruczhqjett2weamt.py", line 118, in 2025-12-04T10:35:20.0292004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0292094Z kernel.precompile( 2025-12-04T10:35:20.0292564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0292655Z self._precompile_worker() 2025-12-04T10:35:20.0293200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0293347Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0293890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0294055Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0294441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0294640Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0295010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0295291Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0295506Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0295891Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0295962Z ^ 2025-12-04T10:35:20.0296348Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0296352Z 2025-12-04T10:35:20.0296960Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0296964Z 2025-12-04T10:35:20.0296968Z 2025-12-04T10:35:20.0297149Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0297791Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0297798Z 2025-12-04T10:35:20.0298019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0298197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0298282Z frames [('total', 1)] 2025-12-04T10:35:20.0298376Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0298573Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0298755Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0298834Z graph_break [] 2025-12-04T10:35:20.0299142Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0299229Z frames [('total', 1)] 2025-12-04T10:35:20.0299327Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0299506Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0299697Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0299782Z graph_break [] 2025-12-04T10:35:20.0299957Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0300041Z frames [('total', 1)] 2025-12-04T10:35:20.0300132Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0300313Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0300504Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0300585Z graph_break [] 2025-12-04T10:35:20.0301141Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.xml - 2025-12-04T10:35:20.0301283Z =========================== short test summary info ============================ 2025-12-04T10:35:20.0301912Z FAILED [0.3863s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0302278Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0302390Z ^ 2025-12-04T10:35:20.0302775Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0302844Z 2025-12-04T10:35:20.0303444Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0303449Z 2025-12-04T10:35:20.0303452Z 2025-12-04T10:35:20.0303636Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0304279Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0304284Z 2025-12-04T10:35:20.0304506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0304654Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.0304822Z ================== 1 failed, 14 deselected, 2 rerun in 2.78s =================== 2025-12-04T10:35:20.0304900Z Got exit code 1 2025-12-04T10:35:20.0304993Z Retrying single test... 2025-12-04T10:35:20.0305392Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.xml 2025-12-04T10:35:20.0305521Z ============================= test session starts ============================== 2025-12-04T10:35:20.0305819Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.0305905Z cachedir: .pytest_cache 2025-12-04T10:35:20.0306346Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.0306451Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.0306538Z configfile: pytest.ini 2025-12-04T10:35:20.0307002Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.0307183Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.0307926Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0308021Z Running 1 items in this shard 2025-12-04T10:35:20.0308026Z 2025-12-04T10:35:20.0309151Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0309897Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0310255Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0310625Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0311064Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0311452Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0311900Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0312351Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0312900Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0313388Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0313910Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0314287Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0314721Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0315119Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0315529Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0315926Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0316425Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0316862Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0317320Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0317804Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0318283Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0318812Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0319241Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0319630Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0320073Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0320549Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0320915Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0321395Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0321844Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0322443Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0322750Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0324480Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0325011Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0326012Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0326579Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0327386Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0328007Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0328811Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0329511Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0330066Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0330854Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0331178Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0331997Z E1204 10:20:39.678000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0332269Z ('RERUN', {'yellow': True}) [2.0059s] [100%] 2025-12-04T10:35:20.0333262Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0333990Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0334349Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0334722Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0335162Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0335547Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0335997Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0336451Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0336981Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0337510Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0337977Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0338347Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0338781Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0339216Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0339601Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0339971Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0340466Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0340908Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0341359Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0341841Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0342324Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0342845Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0343273Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0343744Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0344113Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0344588Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0344949Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0345443Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0346067Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0346848Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0347195Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0348927Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0349513Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0350402Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0350933Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0351690Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0352264Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0353014Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0353666Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0354177Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0354910Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0355214Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0356046Z E1204 10:20:40.116000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0356156Z ('RERUN', {'yellow': True}) [0.4051s] [100%] 2025-12-04T10:35:20.0357150Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0357887Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0358242Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0358610Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0359051Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0359433Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0359883Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0360380Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0360867Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0361506Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0361977Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0362345Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0362781Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0363178Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0363560Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0363932Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0364429Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0364870Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0365324Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0365812Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0366293Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0366818Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0367357Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0367754Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0368118Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0368590Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0368957Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0369434Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0369885Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0370486Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0370782Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0372505Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0373038Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0373921Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0374447Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0375204Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0375826Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0376579Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0377230Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0377741Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0378478Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0378779Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0379667Z E1204 10:20:40.507000 79365 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0379751Z FAILED [0.3898s] [100%] 2025-12-04T10:35:20.0379756Z 2025-12-04T10:35:20.0379873Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.0380119Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0380222Z Traceback (most recent call last): 2025-12-04T10:35:20.0380555Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0380679Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0381092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0381305Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0381742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0381903Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0382331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0382446Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0382944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0383213Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0383694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0383811Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0384218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0384315Z return self._compile_to_module() 2025-12-04T10:35:20.0384719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0384851Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0385320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0385466Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0385963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0386162Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0386657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0386766Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0387208Z File "/tmp/tmpjero85qf/vu/cvupuyoingroaq2iflrglemabqdojijubjjtd5qqp7g3j3o27tbc.py", line 118, in 2025-12-04T10:35:20.0387598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0387685Z kernel.precompile( 2025-12-04T10:35:20.0388154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0388255Z self._precompile_worker() 2025-12-04T10:35:20.0388758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0388905Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0389407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0389690Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0390100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0390316Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0390712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0391016Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0391218Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0391608Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0391679Z ^ 2025-12-04T10:35:20.0392090Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0392104Z 2025-12-04T10:35:20.0392756Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0392761Z 2025-12-04T10:35:20.0392765Z 2025-12-04T10:35:20.0392953Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0393647Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0393694Z 2025-12-04T10:35:20.0393915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0394132Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0394215Z frames [('total', 1)] 2025-12-04T10:35:20.0394305Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0394507Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0394688Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0394766Z graph_break [] 2025-12-04T10:35:20.0395012Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0395108Z Traceback (most recent call last): 2025-12-04T10:35:20.0395437Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0395566Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0395973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0396182Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0396611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0396772Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0397206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0397326Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0397777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0398055Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0398491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0398616Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0399017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0399113Z return self._compile_to_module() 2025-12-04T10:35:20.0399612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0399751Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0400194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0400299Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0400718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0400923Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0401419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0401528Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0401971Z File "/tmp/tmpnuib74iv/q5/cq53e3gj6wlonaq2mc2btbrb5nbvvvjfyf5jotxgxilvfkujxjrv.py", line 118, in 2025-12-04T10:35:20.0402358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0402447Z kernel.precompile( 2025-12-04T10:35:20.0402918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0403013Z self._precompile_worker() 2025-12-04T10:35:20.0403562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0403708Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0404251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0404415Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0404793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0404998Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0405367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0405697Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0405889Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0406246Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0406320Z ^ 2025-12-04T10:35:20.0406705Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0406710Z 2025-12-04T10:35:20.0407319Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0407326Z 2025-12-04T10:35:20.0407330Z 2025-12-04T10:35:20.0407508Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0408345Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0408354Z 2025-12-04T10:35:20.0408580Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0408755Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0408849Z frames [('total', 1)] 2025-12-04T10:35:20.0408940Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0409137Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0409322Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0409398Z graph_break [] 2025-12-04T10:35:20.0409704Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0409790Z frames [('total', 1)] 2025-12-04T10:35:20.0409879Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0410058Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0410252Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0410334Z graph_break [] 2025-12-04T10:35:20.0410452Z =================================== FAILURES =================================== 2025-12-04T10:35:20.0410697Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0410799Z Traceback (most recent call last): 2025-12-04T10:35:20.0411139Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0411263Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0411684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0411893Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0412324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0412484Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0412974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0413091Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0413547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0413870Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0414314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0414434Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0414834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0414931Z return self._compile_to_module() 2025-12-04T10:35:20.0415346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0415500Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0415963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0416072Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0416487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0416686Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0417180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0417286Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0417702Z File "/tmp/tmppsffx5_k/7v/c7vzaa7nwy65vzavgac7zhgdl3nrjmja5a2yko4dvmr4egroo5ye.py", line 118, in 2025-12-04T10:35:20.0418094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0418181Z kernel.precompile( 2025-12-04T10:35:20.0418646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0418747Z self._precompile_worker() 2025-12-04T10:35:20.0419323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0419552Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0420060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0420220Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0420598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0420807Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0421179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0421464Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0421653Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0422020Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0422089Z ^ 2025-12-04T10:35:20.0422473Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0422477Z 2025-12-04T10:35:20.0423085Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0423133Z 2025-12-04T10:35:20.0423137Z 2025-12-04T10:35:20.0423314Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0423963Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0424008Z 2025-12-04T10:35:20.0424230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0424414Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0424494Z frames [('total', 1)] 2025-12-04T10:35:20.0424586Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0424783Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0424964Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0425042Z graph_break [] 2025-12-04T10:35:20.0425220Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0425304Z frames [('total', 1)] 2025-12-04T10:35:20.0425393Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0425574Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0425765Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0425853Z graph_break [] 2025-12-04T10:35:20.0426025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0426102Z frames [('total', 1)] 2025-12-04T10:35:20.0426200Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0426378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0426573Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0426657Z graph_break [] 2025-12-04T10:35:20.0427218Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.xml - 2025-12-04T10:35:20.0427360Z =========================== short test summary info ============================ 2025-12-04T10:35:20.0428000Z FAILED [0.3898s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0428361Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0428434Z ^ 2025-12-04T10:35:20.0428925Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0428931Z 2025-12-04T10:35:20.0429540Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0429545Z 2025-12-04T10:35:20.0429549Z 2025-12-04T10:35:20.0429728Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0430374Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0430381Z 2025-12-04T10:35:20.0430604Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0430750Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.0430925Z ================== 1 failed, 187 deselected, 2 rerun in 2.84s ================== 2025-12-04T10:35:20.0431002Z Got exit code 1 2025-12-04T10:35:20.0431087Z Retrying single test... 2025-12-04T10:35:20.0431492Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.xml 2025-12-04T10:35:20.0431623Z ============================= test session starts ============================== 2025-12-04T10:35:20.0431915Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.0432050Z cachedir: .pytest_cache 2025-12-04T10:35:20.0432495Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.0432637Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.0432723Z configfile: pytest.ini 2025-12-04T10:35:20.0433180Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.0433374Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.0433946Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0434045Z Running 1 items in this shard 2025-12-04T10:35:20.0434049Z 2025-12-04T10:35:20.0435047Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0435838Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0436200Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0436576Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0437017Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0437398Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0437846Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0438300Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0438794Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0439369Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0439842Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0440213Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0440650Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0441043Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0441432Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0441809Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0442307Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0442742Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0443197Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0443738Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0444255Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0444794Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0445218Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0445622Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0446024Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0446500Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0446872Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0447350Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0447891Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0448490Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0448787Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0450615Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0451071Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0451957Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0452490Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0453244Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0453819Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0454566Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0455220Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0455798Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0456607Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0456911Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0457673Z E1204 10:20:50.385000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0457783Z ('RERUN', {'yellow': True}) [1.9923s] [100%] 2025-12-04T10:35:20.0458775Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0459558Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0459923Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0460302Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0460738Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0461127Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0461579Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0462041Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0462612Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0463103Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0463576Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0463949Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0464384Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0464783Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0465163Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0465542Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0466037Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0466472Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0466972Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0467458Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0468007Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0468535Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0468958Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0469350Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0469719Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0470196Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0470561Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0471042Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0471490Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0472085Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0472385Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0474193Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0474656Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0475538Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0476079Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0476835Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0477409Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0478157Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0478850Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0479405Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0480142Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0480446Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0481205Z E1204 10:20:50.807000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0481313Z ('RERUN', {'yellow': True}) [0.3890s] [100%] 2025-12-04T10:35:20.0482302Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T10:35:20.0483037Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0483394Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.0483761Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 320 2025-12-04T10:35:20.0484199Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 512 2025-12-04T10:35:20.0484582Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.0485029Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.0485489Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.0486103Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.0486595Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.0487059Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.0487427Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.0487862Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.0488257Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.0488642Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.0489010Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.0489506Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T10:35:20.0489952Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.0490448Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T10:35:20.0490978Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.0491459Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T10:35:20.0491985Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.0492413Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T10:35:20.0492802Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T10:35:20.0493172Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T10:35:20.0493648Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T10:35:20.0494018Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T10:35:20.0494499Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T10:35:20.0494945Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T10:35:20.0495543Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T10:35:20.0495891Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.0497707Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.0498160Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.0499093Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0499627Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0500388Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0500959Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0501700Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0502396Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0503033Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0503776Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0504076Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.0504834Z E1204 10:20:51.196000 79578 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0504918Z FAILED [0.3873s] [100%] 2025-12-04T10:35:20.0504923Z 2025-12-04T10:35:20.0505038Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.0505297Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0505397Z Traceback (most recent call last): 2025-12-04T10:35:20.0505779Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0505916Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0506325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0506536Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0506968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0507127Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0507560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0507678Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0508262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0508653Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0509097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0509223Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0509631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0509733Z return self._compile_to_module() 2025-12-04T10:35:20.0510142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0510277Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0510717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0510824Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0511255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0511449Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0511945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0512052Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0512575Z File "/tmp/tmpb8tnvy9q/hs/chscfevuhngazyj2gf4j23d7xcdsorzgbozgy5i4eweytdfv4bta.py", line 118, in 2025-12-04T10:35:20.0512963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0513109Z kernel.precompile( 2025-12-04T10:35:20.0513578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0513669Z self._precompile_worker() 2025-12-04T10:35:20.0514183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0514329Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0514837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0515000Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0515381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0515612Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0516013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0516296Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0516491Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0516850Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0516921Z ^ 2025-12-04T10:35:20.0517310Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0517315Z 2025-12-04T10:35:20.0517922Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0517927Z 2025-12-04T10:35:20.0517931Z 2025-12-04T10:35:20.0518111Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0518757Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0518763Z 2025-12-04T10:35:20.0519074Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0519257Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0519341Z frames [('total', 1)] 2025-12-04T10:35:20.0519433Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0519628Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0519819Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0519895Z graph_break [] 2025-12-04T10:35:20.0520139Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0520253Z Traceback (most recent call last): 2025-12-04T10:35:20.0520585Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0520711Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0521123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0521330Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0521763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0521918Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0522395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0522518Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0522965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0523278Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0523721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0523839Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0524242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0524339Z return self._compile_to_module() 2025-12-04T10:35:20.0524751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0524886Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0525318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0525433Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0525847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0526042Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0526544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0526646Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0527099Z File "/tmp/tmpd4ubmg7c/sy/csyatmqpxjyqxwfhahrl4vh7cdfbspreduldqf7qhfakrpfl4hes.py", line 118, in 2025-12-04T10:35:20.0527600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0527730Z kernel.precompile( 2025-12-04T10:35:20.0528680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0529636Z self._precompile_worker() 2025-12-04T10:35:20.0538635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0539885Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0540989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0542156Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0543114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0544172Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0545206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0546339Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0547220Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0548214Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0549077Z ^ 2025-12-04T10:35:20.0549713Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0550314Z 2025-12-04T10:35:20.0550922Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0551740Z 2025-12-04T10:35:20.0551744Z 2025-12-04T10:35:20.0551930Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0552867Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0553677Z 2025-12-04T10:35:20.0553899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0554424Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0554804Z frames [('total', 1)] 2025-12-04T10:35:20.0555033Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0555413Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0555940Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0556310Z graph_break [] 2025-12-04T10:35:20.0556603Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0556979Z frames [('total', 1)] 2025-12-04T10:35:20.0557204Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0557551Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0558037Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0558417Z graph_break [] 2025-12-04T10:35:20.0558651Z =================================== FAILURES =================================== 2025-12-04T10:35:20.0559137Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T10:35:20.0559592Z Traceback (most recent call last): 2025-12-04T10:35:20.0560114Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T10:35:20.0560682Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T10:35:20.0561328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0562055Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0562806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0563513Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0564207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0564952Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0565679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0566519Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0567343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0568017Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0568648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0569262Z return self._compile_to_module() 2025-12-04T10:35:20.0569856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0570510Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0571190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0571840Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0572459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0573181Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0574027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0574737Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0575420Z File "/tmp/tmppv1w55k7/z5/cz5bjnqeovnr7mbxzhf5hcl64pmdawkpjodswz5gb5cju2bmqezn.py", line 118, in 2025-12-04T10:35:20.0576422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.0577018Z kernel.precompile( 2025-12-04T10:35:20.0577637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.0578312Z self._precompile_worker() 2025-12-04T10:35:20.0578985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0579826Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0580583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0581364Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0582021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0582718Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0583411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0584178Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0584763Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0585418Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0585961Z ^ 2025-12-04T10:35:20.0586440Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0586941Z 2025-12-04T10:35:20.0587546Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0588263Z 2025-12-04T10:35:20.0588267Z 2025-12-04T10:35:20.0588449Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0589495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0590256Z 2025-12-04T10:35:20.0590478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0590989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0591367Z frames [('total', 1)] 2025-12-04T10:35:20.0591595Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0591957Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0592450Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0592825Z graph_break [] 2025-12-04T10:35:20.0593122Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0593499Z frames [('total', 1)] 2025-12-04T10:35:20.0593725Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0594081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0594565Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0594943Z graph_break [] 2025-12-04T10:35:20.0595233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0595605Z frames [('total', 1)] 2025-12-04T10:35:20.0595831Z stats [('calls_captured', 6)] 2025-12-04T10:35:20.0596223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0596707Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0597091Z graph_break [] 2025-12-04T10:35:20.0597803Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.xml - 2025-12-04T10:35:20.0598606Z =========================== short test summary info ============================ 2025-12-04T10:35:20.0599505Z FAILED [0.3873s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.0600600Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0601139Z ^ 2025-12-04T10:35:20.0601616Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0602125Z 2025-12-04T10:35:20.0602727Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0603444Z 2025-12-04T10:35:20.0603448Z 2025-12-04T10:35:20.0603630Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0604562Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0605318Z 2025-12-04T10:35:20.0605537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0606022Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.0606451Z ================== 1 failed, 187 deselected, 2 rerun in 2.80s ================== 2025-12-04T10:35:20.0606804Z Got exit code 1 2025-12-04T10:35:20.0607371Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.0608456Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.0609320Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.xml 2025-12-04T10:35:20.0609966Z ============================= test session starts ============================== 2025-12-04T10:35:20.0610644Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.0611142Z cachedir: .pytest_cache 2025-12-04T10:35:20.0611734Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.0612387Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.0612667Z configfile: pytest.ini 2025-12-04T10:35:20.0613272Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.0614025Z collecting ... collected 188 items / 15 deselected / 173 selected 2025-12-04T10:35:20.0614442Z stepcurrent: skipping 15 already run items. 2025-12-04T10:35:20.0614747Z Running 173 items in this shard 2025-12-04T10:35:20.0614921Z 2025-12-04T10:35:20.0615300Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [1.8094s] [ 0%] 2025-12-04T10:35:20.0616172Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2173s] [ 1%] 2025-12-04T10:35:20.0617046Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.4767s] [ 1%] 2025-12-04T10:35:20.0617921Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2386s] [ 2%] 2025-12-04T10:35:20.0618802Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.4153s] [ 2%] 2025-12-04T10:35:20.0619927Z inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 3%] 2025-12-04T10:35:20.0621053Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.7698s] [ 4%] 2025-12-04T10:35:20.0622347Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6868s] [ 4%] 2025-12-04T10:35:20.0623326Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 FAILED [0.6911s] [ 4%] 2025-12-04T10:35:20.0623801Z 2025-12-04T10:35:20.0623918Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.0624393Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.0624847Z Traceback (most recent call last): 2025-12-04T10:35:20.0625373Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.0625978Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.0626616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0627344Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0628097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0628803Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0629501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0630162Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0630834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0631673Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0632502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0633175Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0633810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0634525Z return self._compile_to_module() 2025-12-04T10:35:20.0635129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0635887Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0636558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0637217Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0637842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0638565Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0639366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0640078Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0640714Z File "/tmp/tmpn9o5rqa_/he/che3ee5fwgdgzz2elixqzfqkaog7xykdwzwtlsuwigqocfit3hks.py", line 193, in 2025-12-04T10:35:20.0641686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.0642271Z self._wait_futures(scope) 2025-12-04T10:35:20.0642858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.0643972Z kernel = result.result() 2025-12-04T10:35:20.0644510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.0645132Z return self.result_fn() 2025-12-04T10:35:20.0645698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.0646317Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.0646855Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.0647294Z 2025-12-04T10:35:20.0647467Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.0647842Z Traceback (most recent call last): 2025-12-04T10:35:20.0648485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.0649134Z result = job() 2025-12-04T10:35:20.0649762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.0650493Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.0651183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.0651856Z self._precompile_worker() 2025-12-04T10:35:20.0652692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0653730Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0654610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0655543Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0656385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0657145Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0658058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0658990Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0659726Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0660563Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0661325Z ^ 2025-12-04T10:35:20.0661941Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0662479Z 2025-12-04T10:35:20.0662483Z 2025-12-04T10:35:20.0663142Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0663948Z 2025-12-04T10:35:20.0663951Z 2025-12-04T10:35:20.0664183Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0665201Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0666028Z 2025-12-04T10:35:20.0666277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0666986Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0667522Z frames [('total', 1)] 2025-12-04T10:35:20.0667845Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0668391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0669350Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0670202Z graph_break [] 2025-12-04T10:35:20.0670666Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0671300Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.0671830Z Traceback (most recent call last): 2025-12-04T10:35:20.0672523Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.0673198Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.0674013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0674908Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0675763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0676634Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0677448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0678204Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0679024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0679981Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0680937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0681707Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0682457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0683202Z return self._compile_to_module() 2025-12-04T10:35:20.0684036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0684762Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0685571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0686426Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0687227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0688054Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0689011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0689847Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0690553Z File "/tmp/tmp4w4t2s34/t3/ct3eoqyx4525zlli6efa35d6a67do2d25lzzayqzlgzmidr2bec6.py", line 193, in 2025-12-04T10:35:20.0691654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.0692349Z self._wait_futures(scope) 2025-12-04T10:35:20.0693023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.0693890Z kernel = result.result() 2025-12-04T10:35:20.0694541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.0695204Z return self.result_fn() 2025-12-04T10:35:20.0696019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.0696708Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.0697323Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.0697931Z 2025-12-04T10:35:20.0698189Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.0698630Z Traceback (most recent call last): 2025-12-04T10:35:20.0699447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.0700319Z result = job() 2025-12-04T10:35:20.0701045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.0701939Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.0702811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.0703606Z self._precompile_worker() 2025-12-04T10:35:20.0704424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0705273Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0706210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0707136Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0708075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0708869Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0709718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0710609Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0711370Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0712149Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0712889Z ^ 2025-12-04T10:35:20.0713472Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0713996Z 2025-12-04T10:35:20.0714000Z 2025-12-04T10:35:20.0714742Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0715567Z 2025-12-04T10:35:20.0715738Z 2025-12-04T10:35:20.0715967Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0716959Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0717785Z 2025-12-04T10:35:20.0718078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0718698Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0719136Z frames [('total', 1)] 2025-12-04T10:35:20.0719548Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0720081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0721025Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0721928Z graph_break [] 2025-12-04T10:35:20.0722303Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0722822Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0723386Z frames [('total', 1)] 2025-12-04T10:35:20.0723687Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0724143Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0725192Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0726126Z graph_break [] 2025-12-04T10:35:20.0726499Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0727175Z =================================== FAILURES =================================== 2025-12-04T10:35:20.0727782Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.0728283Z Traceback (most recent call last): 2025-12-04T10:35:20.0728977Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.0729668Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.0730356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0731259Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0732140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0732988Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0733774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0734561Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0735383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0736485Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0737380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0738211Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0738986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0739759Z return self._compile_to_module() 2025-12-04T10:35:20.0740469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0741271Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0742046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0742972Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0743671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0744499Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0745554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0746379Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0747085Z File "/tmp/tmpzz12oiau/3z/c3zpuujs6tubvaxkxduwi267o25fgvl76andvjwe7kffrs5h5o4a.py", line 193, in 2025-12-04T10:35:20.0748193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.0748881Z self._wait_futures(scope) 2025-12-04T10:35:20.0749545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.0750347Z kernel = result.result() 2025-12-04T10:35:20.0750984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.0751677Z return self.result_fn() 2025-12-04T10:35:20.0752380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.0753153Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.0753795Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.0754400Z 2025-12-04T10:35:20.0754654Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.0755143Z Traceback (most recent call last): 2025-12-04T10:35:20.0755952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.0756777Z result = job() 2025-12-04T10:35:20.0757539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.0758321Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.0759182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.0759979Z self._precompile_worker() 2025-12-04T10:35:20.0760708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0761646Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0762530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0763536Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0764277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0765098Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0765996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0766903Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0767517Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0768340Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0769073Z ^ 2025-12-04T10:35:20.0769654Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0770176Z 2025-12-04T10:35:20.0770180Z 2025-12-04T10:35:20.0770958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0771788Z 2025-12-04T10:35:20.0771792Z 2025-12-04T10:35:20.0772005Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0773060Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0773792Z 2025-12-04T10:35:20.0774179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0774798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0775239Z frames [('total', 1)] 2025-12-04T10:35:20.0775652Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0776102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0777042Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0777981Z graph_break [] 2025-12-04T10:35:20.0778346Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0778888Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0779457Z frames [('total', 1)] 2025-12-04T10:35:20.0779859Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0780371Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0781403Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0782284Z graph_break [] 2025-12-04T10:35:20.0782633Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0783244Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0783710Z frames [('total', 1)] 2025-12-04T10:35:20.0784028Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0784547Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0785551Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0786374Z graph_break [] 2025-12-04T10:35:20.0786801Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0787810Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.xml - 2025-12-04T10:35:20.0788763Z =========================== short test summary info ============================ 2025-12-04T10:35:20.0789879Z FAILED [0.6911s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.0790805Z 2025-12-04T10:35:20.0791008Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.0791520Z Traceback (most recent call last): 2025-12-04T10:35:20.0792312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.0793026Z result = job() 2025-12-04T10:35:20.0793783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.0794655Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.0795412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.0796142Z self._precompile_worker() 2025-12-04T10:35:20.0796827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0802941Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0803762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0804564Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0805234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0806014Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0806717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0807496Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0808242Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0808939Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0809547Z ^ 2025-12-04T10:35:20.0810035Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0810550Z 2025-12-04T10:35:20.0810554Z 2025-12-04T10:35:20.0811160Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0811972Z 2025-12-04T10:35:20.0811976Z 2025-12-04T10:35:20.0812159Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0813115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0813825Z 2025-12-04T10:35:20.0814057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0814560Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.0815036Z ======== 1 failed, 5 passed, 1 skipped, 15 deselected, 2 rerun in 5.35s ======== 2025-12-04T10:35:20.0815457Z Got exit code 1 2025-12-04T10:35:20.0815712Z Retrying single test... 2025-12-04T10:35:20.0816272Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.xml 2025-12-04T10:35:20.0816937Z ============================= test session starts ============================== 2025-12-04T10:35:20.0817497Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.0817998Z cachedir: .pytest_cache 2025-12-04T10:35:20.0818598Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.0819312Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.0819599Z configfile: pytest.ini 2025-12-04T10:35:20.0820223Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.0820987Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.0821816Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0822546Z Running 1 items in this shard 2025-12-04T10:35:20.0822731Z 2025-12-04T10:35:20.0823537Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:13.244626906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0824456Z 2025-12-04T10:35:20.0824897Z [W1204 10:21:22.874945366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0825461Z 2025-12-04T10:35:20.0826089Z [W1204 10:21:22.875200301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0826640Z 2025-12-04T10:35:20.0827076Z [W1204 10:21:22.877505266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0827630Z 2025-12-04T10:35:20.0828067Z [W1204 10:21:22.877692230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0828626Z 2025-12-04T10:35:20.0829053Z [W1204 10:21:22.879827542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0829609Z 2025-12-04T10:35:20.0830044Z [W1204 10:21:22.880170298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0830595Z 2025-12-04T10:35:20.0831040Z [W1204 10:21:22.880345852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0831590Z 2025-12-04T10:35:20.0832033Z [W1204 10:21:22.880756270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0832587Z 2025-12-04T10:35:20.0833013Z [W1204 10:21:22.880926393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0833618Z 2025-12-04T10:35:20.0834054Z [W1204 10:21:22.881389692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0834725Z 2025-12-04T10:35:20.0835155Z [W1204 10:21:22.881559336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0835705Z 2025-12-04T10:35:20.0836148Z [W1204 10:21:22.881907692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0836699Z 2025-12-04T10:35:20.0837140Z [W1204 10:21:22.882075416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0837694Z 2025-12-04T10:35:20.0838125Z [W1204 10:21:22.882399012 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0838684Z 2025-12-04T10:35:20.0839112Z [W1204 10:21:22.882564115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0839674Z 2025-12-04T10:35:20.0840105Z [W1204 10:21:22.882883152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0840657Z 2025-12-04T10:35:20.0841100Z [W1204 10:21:22.883053455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0841650Z 2025-12-04T10:35:20.0841765Z ('RERUN', {'yellow': True}) [11.8361s] [100%] 2025-12-04T10:35:20.0842783Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:23.018839188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0843704Z 2025-12-04T10:35:20.0844133Z [W1204 10:21:23.019208415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0844694Z 2025-12-04T10:35:20.0845131Z [W1204 10:21:23.019377728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0845740Z 2025-12-04T10:35:20.0846180Z [W1204 10:21:23.019854368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0846736Z 2025-12-04T10:35:20.0847260Z [W1204 10:21:24.020048411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0847816Z 2025-12-04T10:35:20.0848247Z [W1204 10:21:24.020354697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0848809Z 2025-12-04T10:35:20.0849243Z [W1204 10:21:24.020599552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0849802Z 2025-12-04T10:35:20.0850236Z [W1204 10:21:24.020759475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0850800Z 2025-12-04T10:35:20.0851239Z [W1204 10:21:24.021125812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0851789Z 2025-12-04T10:35:20.0852236Z [W1204 10:21:24.021293376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0852789Z 2025-12-04T10:35:20.0853232Z [W1204 10:21:24.021666163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0853782Z 2025-12-04T10:35:20.0854216Z [W1204 10:21:24.021833726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0854825Z 2025-12-04T10:35:20.0855256Z [W1204 10:21:24.022153082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0855904Z 2025-12-04T10:35:20.0856333Z [W1204 10:21:24.022318346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0856887Z 2025-12-04T10:35:20.0857329Z [W1204 10:21:24.022617902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0857881Z 2025-12-04T10:35:20.0858318Z [W1204 10:21:24.022780575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0858866Z 2025-12-04T10:35:20.0859348Z [W1204 10:21:24.023081891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0859907Z 2025-12-04T10:35:20.0860338Z [W1204 10:21:24.023264954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0860899Z 2025-12-04T10:35:20.0861006Z ('RERUN', {'yellow': True}) [0.6955s] [100%] 2025-12-04T10:35:20.0862027Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:24.717025252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0862934Z 2025-12-04T10:35:20.0863378Z [W1204 10:21:24.717378399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0863932Z 2025-12-04T10:35:20.0864361Z [W1204 10:21:24.717546002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0864921Z 2025-12-04T10:35:20.0865349Z [W1204 10:21:24.718013861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0865903Z 2025-12-04T10:35:20.0866337Z [W1204 10:21:24.718188945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0866887Z 2025-12-04T10:35:20.0867412Z [W1204 10:21:24.718487601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0867965Z 2025-12-04T10:35:20.0868403Z [W1204 10:21:24.718730325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0868956Z 2025-12-04T10:35:20.0869384Z [W1204 10:21:24.718891319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0869945Z 2025-12-04T10:35:20.0870374Z [W1204 10:21:24.719281516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0870931Z 2025-12-04T10:35:20.0871365Z [W1204 10:21:24.719450250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0871915Z 2025-12-04T10:35:20.0872350Z [W1204 10:21:24.719817857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0872906Z 2025-12-04T10:35:20.0873340Z [W1204 10:21:24.719983540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0873896Z 2025-12-04T10:35:20.0874327Z [W1204 10:21:24.720340757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0874879Z 2025-12-04T10:35:20.0875359Z [W1204 10:21:24.720510120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0875968Z 2025-12-04T10:35:20.0876397Z [W1204 10:21:24.720812906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0876992Z 2025-12-04T10:35:20.0877429Z [W1204 10:21:24.720978199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0877982Z 2025-12-04T10:35:20.0878422Z [W1204 10:21:24.721287535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0878973Z 2025-12-04T10:35:20.0879404Z [W1204 10:21:24.721452069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0879959Z 2025-12-04T10:35:20.0880039Z FAILED [0.7148s] [100%] 2025-12-04T10:35:20.0880196Z 2025-12-04T10:35:20.0880316Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.0880809Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.0881267Z Traceback (most recent call last): 2025-12-04T10:35:20.0881797Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.0882368Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.0883026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0883765Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0884537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0885265Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0886026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0886711Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0887409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0888266Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0889193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0889883Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0890531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0891159Z return self._compile_to_module() 2025-12-04T10:35:20.0891764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0892442Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0893144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0893810Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0894457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0895199Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0896023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0896752Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0897426Z File "/tmp/tmpmuh47gt6/xf/cxfpjbopqoo6er7nay4wy7kqrqyhdkgfou7cxikujsoorskbn76t.py", line 193, in 2025-12-04T10:35:20.0898437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.0899097Z self._wait_futures(scope) 2025-12-04T10:35:20.0899698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.0900414Z kernel = result.result() 2025-12-04T10:35:20.0900966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.0901557Z return self.result_fn() 2025-12-04T10:35:20.0902143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.0902790Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.0903342Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.0903803Z 2025-12-04T10:35:20.0903987Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.0904389Z Traceback (most recent call last): 2025-12-04T10:35:20.0905059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.0905768Z result = job() 2025-12-04T10:35:20.0906421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.0907172Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.0908035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.0908726Z self._precompile_worker() 2025-12-04T10:35:20.0909412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0910200Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0910973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0911769Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0912447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0913162Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0913987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0914774Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0915335Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0916069Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0916675Z ^ 2025-12-04T10:35:20.0917161Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0917166Z 2025-12-04T10:35:20.0917171Z 2025-12-04T10:35:20.0917853Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0917865Z 2025-12-04T10:35:20.0917870Z 2025-12-04T10:35:20.0918116Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0918892Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0918911Z 2025-12-04T10:35:20.0919206Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0919392Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0919575Z frames [('total', 1)] 2025-12-04T10:35:20.0919673Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0920242Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0920494Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0920573Z graph_break [] 2025-12-04T10:35:20.0920730Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0920911Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.0921960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.0922065Z if out == self.unknown_value: 2025-12-04T10:35:20.0922306Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.0922408Z Traceback (most recent call last): 2025-12-04T10:35:20.0922747Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.0922866Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.0923290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0923505Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0923942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0924108Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0924543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0924671Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0925124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0925398Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0925851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0925974Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0926467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0926572Z return self._compile_to_module() 2025-12-04T10:35:20.0926983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0927130Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0927574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0927680Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0928101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0928300Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0928808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0928911Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0929357Z File "/tmp/tmpkmvylrrf/em/cemmtpb4skdnjxt2ufsdlm7xfsxvsgbunm3eh5n6njfmsvuxg3my.py", line 193, in 2025-12-04T10:35:20.0929748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.0929847Z self._wait_futures(scope) 2025-12-04T10:35:20.0930308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.0930410Z kernel = result.result() 2025-12-04T10:35:20.0930826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.0930931Z return self.result_fn() 2025-12-04T10:35:20.0931338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.0931449Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.0931777Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.0931783Z 2025-12-04T10:35:20.0932061Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.0932167Z Traceback (most recent call last): 2025-12-04T10:35:20.0932623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.0932711Z result = job() 2025-12-04T10:35:20.0933215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.0933335Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.0933803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.0933907Z self._precompile_worker() 2025-12-04T10:35:20.0934408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0934568Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0935071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0935236Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0935622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0935828Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0936213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0936583Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0936742Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0937164Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0937233Z ^ 2025-12-04T10:35:20.0937618Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0937629Z 2025-12-04T10:35:20.0937633Z 2025-12-04T10:35:20.0938236Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0938243Z 2025-12-04T10:35:20.0938247Z 2025-12-04T10:35:20.0938431Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0939088Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0939093Z 2025-12-04T10:35:20.0939317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0939500Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0939584Z frames [('total', 1)] 2025-12-04T10:35:20.0939681Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0940257Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0940489Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0940609Z graph_break [] 2025-12-04T10:35:20.0940758Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0940938Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.0941985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.0942080Z if out == self.unknown_value: 2025-12-04T10:35:20.0942259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0942349Z frames [('total', 1)] 2025-12-04T10:35:20.0942445Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0942640Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0943203Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0943283Z graph_break [] 2025-12-04T10:35:20.0943431Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0943548Z =================================== FAILURES =================================== 2025-12-04T10:35:20.0943791Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.0943896Z Traceback (most recent call last): 2025-12-04T10:35:20.0944225Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.0944347Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.0944762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.0944973Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.0945437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.0945627Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.0946074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.0946276Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.0946729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.0947015Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.0947457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.0947587Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.0948007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.0948110Z return self._compile_to_module() 2025-12-04T10:35:20.0948532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.0948669Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.0949113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.0949227Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.0949645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.0949850Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.0950423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.0950525Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.0951012Z File "/tmp/tmpt5hrd8p2/ef/cefo55iyjfzqwrbb6wixkhlpge6vd5bfdnzv37ebwsiu33u3x45j.py", line 193, in 2025-12-04T10:35:20.0951395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.0951493Z self._wait_futures(scope) 2025-12-04T10:35:20.0951924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.0952020Z kernel = result.result() 2025-12-04T10:35:20.0952405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.0952500Z return self.result_fn() 2025-12-04T10:35:20.0952909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.0953024Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.0953351Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.0953356Z 2025-12-04T10:35:20.0953540Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.0953646Z Traceback (most recent call last): 2025-12-04T10:35:20.0954117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.0954204Z result = job() 2025-12-04T10:35:20.0954703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.0954820Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.0955292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.0955388Z self._precompile_worker() 2025-12-04T10:35:20.0955898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0956046Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0956631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0956802Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0957184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0957386Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0957764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0958046Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0958205Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0958627Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0958696Z ^ 2025-12-04T10:35:20.0959096Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0959101Z 2025-12-04T10:35:20.0959105Z 2025-12-04T10:35:20.0959711Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0959716Z 2025-12-04T10:35:20.0959720Z 2025-12-04T10:35:20.0959911Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0960551Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0960556Z 2025-12-04T10:35:20.0960788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0961009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0961088Z frames [('total', 1)] 2025-12-04T10:35:20.0961191Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0961766Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0961960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0962045Z graph_break [] 2025-12-04T10:35:20.0962189Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0962371Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.0963408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.0963504Z if out == self.unknown_value: 2025-12-04T10:35:20.0963684Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0963771Z frames [('total', 1)] 2025-12-04T10:35:20.0963876Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0964062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0964619Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0964709Z graph_break [] 2025-12-04T10:35:20.0964856Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0965030Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.0965116Z frames [('total', 1)] 2025-12-04T10:35:20.0965206Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.0965394Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.0966001Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.0966159Z graph_break [] 2025-12-04T10:35:20.0966314Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.0966868Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.xml - 2025-12-04T10:35:20.0967010Z =========================== short test summary info ============================ 2025-12-04T10:35:20.0967767Z FAILED [0.7148s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.0967772Z 2025-12-04T10:35:20.0967949Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.0968059Z Traceback (most recent call last): 2025-12-04T10:35:20.0968527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.0968610Z result = job() 2025-12-04T10:35:20.0969134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.0969254Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.0969736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.0969872Z self._precompile_worker() 2025-12-04T10:35:20.0970378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.0970535Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.0971080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.0971254Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.0971645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.0971857Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.0972244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.0972533Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.0972696Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.0973128Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.0973201Z ^ 2025-12-04T10:35:20.0973604Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.0973609Z 2025-12-04T10:35:20.0973613Z 2025-12-04T10:35:20.0974226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.0974230Z 2025-12-04T10:35:20.0974234Z 2025-12-04T10:35:20.0974427Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.0975027Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0975034Z 2025-12-04T10:35:20.0975260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.0975430Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.0975616Z ================= 1 failed, 187 deselected, 2 rerun in 13.28s ================== 2025-12-04T10:35:20.0975721Z Got exit code 1 2025-12-04T10:35:20.0975822Z Retrying single test... 2025-12-04T10:35:20.0976322Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.xml 2025-12-04T10:35:20.0976473Z ============================= test session starts ============================== 2025-12-04T10:35:20.0976766Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.0976856Z cachedir: .pytest_cache 2025-12-04T10:35:20.0977311Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.0977423Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.0977528Z configfile: pytest.ini 2025-12-04T10:35:20.0977991Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.0978179Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.0978718Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.0978813Z Running 1 items in this shard 2025-12-04T10:35:20.0978818Z 2025-12-04T10:35:20.0979703Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:33.898327559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0979708Z 2025-12-04T10:35:20.0980148Z [W1204 10:21:43.423726291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0980197Z 2025-12-04T10:35:20.0980634Z [W1204 10:21:43.423962555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0980687Z 2025-12-04T10:35:20.0981119Z [W1204 10:21:43.426211739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0981124Z 2025-12-04T10:35:20.0981557Z [W1204 10:21:43.426400383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0981562Z 2025-12-04T10:35:20.0982002Z [W1204 10:21:43.428510754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0982006Z 2025-12-04T10:35:20.0982434Z [W1204 10:21:43.428793240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0982441Z 2025-12-04T10:35:20.0982880Z [W1204 10:21:43.428953803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0982888Z 2025-12-04T10:35:20.0983324Z [W1204 10:21:43.429358051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0983328Z 2025-12-04T10:35:20.0983773Z [W1204 10:21:43.429527394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0983779Z 2025-12-04T10:35:20.0984211Z [W1204 10:21:43.429985833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0984216Z 2025-12-04T10:35:20.0984659Z [W1204 10:21:43.430208658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0984666Z 2025-12-04T10:35:20.0985092Z [W1204 10:21:43.430569305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0985099Z 2025-12-04T10:35:20.0985536Z [W1204 10:21:43.430734908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0985541Z 2025-12-04T10:35:20.0986082Z [W1204 10:21:43.431045164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0986087Z 2025-12-04T10:35:20.0986523Z [W1204 10:21:43.431217547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0986527Z 2025-12-04T10:35:20.0986969Z [W1204 10:21:43.431533303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0986976Z 2025-12-04T10:35:20.0987404Z [W1204 10:21:43.431703827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0987410Z 2025-12-04T10:35:20.0987525Z ('RERUN', {'yellow': True}) [11.7088s] [100%] 2025-12-04T10:35:20.0988329Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:44.559025563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0988338Z 2025-12-04T10:35:20.0988785Z [W1204 10:21:44.559387380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0988789Z 2025-12-04T10:35:20.0989227Z [W1204 10:21:44.559555853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0989275Z 2025-12-04T10:35:20.0989706Z [W1204 10:21:44.560054593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0989722Z 2025-12-04T10:35:20.0990152Z [W1204 10:21:44.560232396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0990199Z 2025-12-04T10:35:20.0990637Z [W1204 10:21:44.560530222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0990641Z 2025-12-04T10:35:20.0991084Z [W1204 10:21:44.560780787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0991088Z 2025-12-04T10:35:20.0991517Z [W1204 10:21:44.560937820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0991521Z 2025-12-04T10:35:20.0991963Z [W1204 10:21:44.561296617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0991967Z 2025-12-04T10:35:20.0992402Z [W1204 10:21:44.561462610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0992408Z 2025-12-04T10:35:20.0992851Z [W1204 10:21:44.561834858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0992855Z 2025-12-04T10:35:20.0993292Z [W1204 10:21:44.562000031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0993296Z 2025-12-04T10:35:20.0993729Z [W1204 10:21:44.562321207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0993743Z 2025-12-04T10:35:20.0994181Z [W1204 10:21:44.562484950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0994188Z 2025-12-04T10:35:20.0994614Z [W1204 10:21:44.562782536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0994621Z 2025-12-04T10:35:20.0995051Z [W1204 10:21:44.562949909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0995055Z 2025-12-04T10:35:20.0995569Z [W1204 10:21:44.563259285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0995574Z 2025-12-04T10:35:20.0996023Z [W1204 10:21:44.563430759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0996027Z 2025-12-04T10:35:20.0996135Z ('RERUN', {'yellow': True}) [0.6979s] [100%] 2025-12-04T10:35:20.0996937Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 10:21:45.271283411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0996945Z 2025-12-04T10:35:20.0997379Z [W1204 10:21:45.271658639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0997383Z 2025-12-04T10:35:20.0997829Z [W1204 10:21:45.271827962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0997834Z 2025-12-04T10:35:20.0998259Z [W1204 10:21:45.273514045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0998264Z 2025-12-04T10:35:20.0998696Z [W1204 10:21:45.273690968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0998742Z 2025-12-04T10:35:20.0999191Z [W1204 10:21:45.274029415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0999196Z 2025-12-04T10:35:20.0999668Z [W1204 10:21:45.274302490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.0999672Z 2025-12-04T10:35:20.1000124Z [W1204 10:21:45.274465444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1000128Z 2025-12-04T10:35:20.1000559Z [W1204 10:21:45.274906802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1000564Z 2025-12-04T10:35:20.1001005Z [W1204 10:21:45.275076675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1001012Z 2025-12-04T10:35:20.1001446Z [W1204 10:21:45.275514584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1001450Z 2025-12-04T10:35:20.1001890Z [W1204 10:21:45.275683797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1001898Z 2025-12-04T10:35:20.1002326Z [W1204 10:21:45.276231038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1002335Z 2025-12-04T10:35:20.1002763Z [W1204 10:21:45.276401051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1002774Z 2025-12-04T10:35:20.1003212Z [W1204 10:21:45.276861950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1003216Z 2025-12-04T10:35:20.1003653Z [W1204 10:21:45.277030664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1003657Z 2025-12-04T10:35:20.1004092Z [W1204 10:21:45.277399771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1004099Z 2025-12-04T10:35:20.1004530Z [W1204 10:21:45.277567154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1004534Z 2025-12-04T10:35:20.1004703Z FAILED [0.7096s] [100%] 2025-12-04T10:35:20.1004708Z 2025-12-04T10:35:20.1004833Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.1005077Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.1005192Z Traceback (most recent call last): 2025-12-04T10:35:20.1005529Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1005685Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1006125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1006339Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1006787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1006954Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1007408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1007531Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1008176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1008525Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1008966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1009145Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1009561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1009662Z return self._compile_to_module() 2025-12-04T10:35:20.1010093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1010228Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1010665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1010776Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1011196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1011403Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1011976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1012084Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1012535Z File "/tmp/tmpt3ffkjt4/wu/cwulgp2m4lgii7pneh6iwn2fog3jajfg7bbwpv7q5q7ouztflghj.py", line 193, in 2025-12-04T10:35:20.1012925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1013022Z self._wait_futures(scope) 2025-12-04T10:35:20.1013453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1013548Z kernel = result.result() 2025-12-04T10:35:20.1013937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1014034Z return self.result_fn() 2025-12-04T10:35:20.1014444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1014562Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1014891Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1014897Z 2025-12-04T10:35:20.1015193Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1015294Z Traceback (most recent call last): 2025-12-04T10:35:20.1015751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1015835Z result = job() 2025-12-04T10:35:20.1016331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1016452Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1016928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1017029Z self._precompile_worker() 2025-12-04T10:35:20.1017551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1017704Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1018213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1018393Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1018778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1019074Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1019452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1019780Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1019945Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1020369Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1020436Z ^ 2025-12-04T10:35:20.1020836Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1020841Z 2025-12-04T10:35:20.1020845Z 2025-12-04T10:35:20.1021455Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1021463Z 2025-12-04T10:35:20.1021467Z 2025-12-04T10:35:20.1021650Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1022251Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.1022258Z 2025-12-04T10:35:20.1022488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1022672Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1022761Z frames [('total', 1)] 2025-12-04T10:35:20.1022866Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1023430Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1023625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1023704Z graph_break [] 2025-12-04T10:35:20.1023852Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1024044Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1025086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1025298Z if out == self.unknown_value: 2025-12-04T10:35:20.1025554Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.1025657Z Traceback (most recent call last): 2025-12-04T10:35:20.1025999Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1026117Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1026535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1026756Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1027191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1027364Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1027797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1027917Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1028384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1028656Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1029097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1029268Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1029669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1029820Z return self._compile_to_module() 2025-12-04T10:35:20.1030227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1030370Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1030810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1030917Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1031340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1031543Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1032040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1032158Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1032596Z File "/tmp/tmpambjhoyd/l4/cl4cyoufkny46ifn7zy4my4osg3vzcqmkwieubma7tvyppx4f7v2.py", line 193, in 2025-12-04T10:35:20.1032980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1033088Z self._wait_futures(scope) 2025-12-04T10:35:20.1033516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1033608Z kernel = result.result() 2025-12-04T10:35:20.1033987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1034079Z return self.result_fn() 2025-12-04T10:35:20.1034494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1034605Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1034935Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1034943Z 2025-12-04T10:35:20.1035116Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1035214Z Traceback (most recent call last): 2025-12-04T10:35:20.1035809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1035890Z result = job() 2025-12-04T10:35:20.1036388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1036506Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1036978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1037071Z self._precompile_worker() 2025-12-04T10:35:20.1037571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1037724Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1038233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1038398Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1038783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1038985Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1039363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1039780Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1039935Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1040390Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1040470Z ^ 2025-12-04T10:35:20.1040860Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1040865Z 2025-12-04T10:35:20.1040870Z 2025-12-04T10:35:20.1041481Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1041486Z 2025-12-04T10:35:20.1041490Z 2025-12-04T10:35:20.1041667Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1042271Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.1042275Z 2025-12-04T10:35:20.1042500Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1042676Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1042764Z frames [('total', 1)] 2025-12-04T10:35:20.1042856Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1043422Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1043615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1043694Z graph_break [] 2025-12-04T10:35:20.1043840Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1044023Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1045065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1045169Z if out == self.unknown_value: 2025-12-04T10:35:20.1045343Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1045506Z frames [('total', 1)] 2025-12-04T10:35:20.1045601Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1045810Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1046404Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1046486Z graph_break [] 2025-12-04T10:35:20.1046634Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1046758Z =================================== FAILURES =================================== 2025-12-04T10:35:20.1046998Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T10:35:20.1047105Z Traceback (most recent call last): 2025-12-04T10:35:20.1047434Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1047551Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1047965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1048177Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1048613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1048784Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1049256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1049377Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1049875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1050145Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1050593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1050712Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1051120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1051217Z return self._compile_to_module() 2025-12-04T10:35:20.1051631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1051766Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1052198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1052305Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1052724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1052920Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1053420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1053520Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1053946Z File "/tmp/tmpfyq3_txa/yj/cyjxtrojtuefmvvz55mw3yodhqgvovybyvjxpy3euykm72uc2sv7.py", line 193, in 2025-12-04T10:35:20.1054334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1054425Z self._wait_futures(scope) 2025-12-04T10:35:20.1054857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1054952Z kernel = result.result() 2025-12-04T10:35:20.1055333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1055540Z return self.result_fn() 2025-12-04T10:35:20.1055962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1056065Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1056390Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1056394Z 2025-12-04T10:35:20.1056566Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1060925Z Traceback (most recent call last): 2025-12-04T10:35:20.1061420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1061505Z result = job() 2025-12-04T10:35:20.1062017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1062136Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1062634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1062737Z self._precompile_worker() 2025-12-04T10:35:20.1063246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1063413Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1063995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1064162Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1064621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1064825Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1065217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1065526Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1065708Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1066144Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1066220Z ^ 2025-12-04T10:35:20.1066612Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1066617Z 2025-12-04T10:35:20.1066623Z 2025-12-04T10:35:20.1067231Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1067236Z 2025-12-04T10:35:20.1067240Z 2025-12-04T10:35:20.1067423Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1068022Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.1068027Z 2025-12-04T10:35:20.1068248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1068436Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1068522Z frames [('total', 1)] 2025-12-04T10:35:20.1068615Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1069183Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1069371Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1069466Z graph_break [] 2025-12-04T10:35:20.1069618Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1069886Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1070933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1071029Z if out == self.unknown_value: 2025-12-04T10:35:20.1071215Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1071299Z frames [('total', 1)] 2025-12-04T10:35:20.1071393Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1071590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1072155Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1072246Z graph_break [] 2025-12-04T10:35:20.1072400Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1072574Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1072666Z frames [('total', 1)] 2025-12-04T10:35:20.1072760Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1072949Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1073561Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1073639Z graph_break [] 2025-12-04T10:35:20.1073822Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1074388Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.xml - 2025-12-04T10:35:20.1074533Z =========================== short test summary info ============================ 2025-12-04T10:35:20.1075294Z FAILED [0.7096s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1075299Z 2025-12-04T10:35:20.1075487Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1075598Z Traceback (most recent call last): 2025-12-04T10:35:20.1076101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1076181Z result = job() 2025-12-04T10:35:20.1076695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1076816Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1077289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1077397Z self._precompile_worker() 2025-12-04T10:35:20.1077908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1078058Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1078567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1078732Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1079114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1079318Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1079692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1080065Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1080225Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1080657Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1080728Z ^ 2025-12-04T10:35:20.1081118Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1081125Z 2025-12-04T10:35:20.1081129Z 2025-12-04T10:35:20.1081742Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1081750Z 2025-12-04T10:35:20.1081754Z 2025-12-04T10:35:20.1081933Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1082540Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.1082545Z 2025-12-04T10:35:20.1082772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1082930Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.1083102Z ================= 1 failed, 187 deselected, 2 rerun in 13.15s ================== 2025-12-04T10:35:20.1083224Z Got exit code 1 2025-12-04T10:35:20.1083616Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T10:35:20.1083964Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.1084403Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.xml 2025-12-04T10:35:20.1084544Z ============================= test session starts ============================== 2025-12-04T10:35:20.1084846Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.1084945Z cachedir: .pytest_cache 2025-12-04T10:35:20.1085389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.1085491Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.1085604Z configfile: pytest.ini 2025-12-04T10:35:20.1086106Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.1086296Z collecting ... collected 188 items / 22 deselected / 166 selected 2025-12-04T10:35:20.1086427Z stepcurrent: skipping 22 already run items. 2025-12-04T10:35:20.1086522Z Running 166 items in this shard 2025-12-04T10:35:20.1086526Z 2025-12-04T10:35:20.1087534Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1088339Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1088802Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1089277Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.1089694Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.1090059Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.1090537Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x1 = (xindex % ks1) 2025-12-04T10:35:20.1091056Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x2 = triton_helpers.div_floor_integer(xindex, ks1) 2025-12-04T10:35:20.1091528Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + load_seed_offset) 2025-12-04T10:35:20.1091891Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = x0 2025-12-04T10:35:20.1092375Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32)) 2025-12-04T10:35:20.1092806Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp2.to(tl.float32) 2025-12-04T10:35:20.1093260Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float8e4nv) 2025-12-04T10:35:20.1093838Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask) 2025-12-04T10:35:20.1094138Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1095831Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1096334Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1097227Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1097763Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1098524Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1099167Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1099982Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1100682Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1101241Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1102100Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1102507Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1103277Z E1204 10:21:55.275000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1103389Z ('RERUN', {'yellow': True}) [2.1723s] [ 0%] 2025-12-04T10:35:20.1104378Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1105165Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1105669Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1106158Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.1106573Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.1106938Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.1107397Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x1 = (xindex % ks1) 2025-12-04T10:35:20.1108204Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x2 = triton_helpers.div_floor_integer(xindex, ks1) 2025-12-04T10:35:20.1108689Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + load_seed_offset) 2025-12-04T10:35:20.1109041Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = x0 2025-12-04T10:35:20.1109519Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32)) 2025-12-04T10:35:20.1109951Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp2.to(tl.float32) 2025-12-04T10:35:20.1110400Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float8e4nv) 2025-12-04T10:35:20.1110963Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask) 2025-12-04T10:35:20.1111265Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1112898Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1113352Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1114239Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1114896Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1115662Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1116239Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1116997Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1117648Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1118167Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1118961Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1119322Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1120085Z E1204 10:21:55.736000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1120247Z ('RERUN', {'yellow': True}) [0.4287s] [ 0%] 2025-12-04T10:35:20.1121234Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1122023Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1122481Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1122963Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.1123380Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.1123749Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.1124150Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x1 = (xindex % ks1) 2025-12-04T10:35:20.1124651Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x2 = triton_helpers.div_floor_integer(xindex, ks1) 2025-12-04T10:35:20.1125122Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + load_seed_offset) 2025-12-04T10:35:20.1125476Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = x0 2025-12-04T10:35:20.1125951Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32)) 2025-12-04T10:35:20.1126380Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp2.to(tl.float32) 2025-12-04T10:35:20.1126907Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float8e4nv) 2025-12-04T10:35:20.1127483Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask) 2025-12-04T10:35:20.1127784Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1129421Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1129876Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1130767Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1131339Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1132095Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1132713Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1133457Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1134117Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1134634Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1135432Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1135741Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1136501Z E1204 10:21:56.166000 81181 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1136587Z FAILED [0.4277s] [ 0%] 2025-12-04T10:35:20.1136592Z 2025-12-04T10:35:20.1136714Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.1136957Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1137058Z Traceback (most recent call last): 2025-12-04T10:35:20.1137396Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1137513Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1138003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1138219Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1138659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1138821Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1139328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1139454Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1139920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1140191Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1140635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1140765Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1141168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1141271Z return self._compile_to_module() 2025-12-04T10:35:20.1141680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1141863Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1142307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1142457Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1142876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1143077Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1143587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1143696Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1144128Z File "/tmp/tmpw1bmfch9/yx/cyxlu4hzpv7kciwzh33qgdxvtkvckv7cr5jucrxqo7oi5d2sdr2n.py", line 60, in 2025-12-04T10:35:20.1144524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1144623Z kernel.precompile( 2025-12-04T10:35:20.1145093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1145194Z self._precompile_worker() 2025-12-04T10:35:20.1145740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1145902Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1146419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1146587Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1146968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1147180Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1147561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1147850Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1148047Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1148473Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1148660Z ^ 2025-12-04T10:35:20.1149052Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1149057Z 2025-12-04T10:35:20.1149670Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1149678Z 2025-12-04T10:35:20.1149682Z 2025-12-04T10:35:20.1149861Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1150449Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1150464Z 2025-12-04T10:35:20.1150690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1150877Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1150972Z frames [('total', 1)] 2025-12-04T10:35:20.1151067Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1151530Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1151727Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1151806Z graph_break [] 2025-12-04T10:35:20.1152027Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1152270Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1152370Z Traceback (most recent call last): 2025-12-04T10:35:20.1152751Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1152867Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1153276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1153497Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1153933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1154098Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1154534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1154654Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1155111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1155386Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1155891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1156017Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1156424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1156530Z return self._compile_to_module() 2025-12-04T10:35:20.1156939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1157077Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1157517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1157626Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1158053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1158245Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1158827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1158943Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1159367Z File "/tmp/tmpcm0pa6f9/zn/czn66v4xmhea5twk6qxq65kb4b7kbketol6ch6z6h4du7mkb7z5h.py", line 60, in 2025-12-04T10:35:20.1159767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1159861Z kernel.precompile( 2025-12-04T10:35:20.1160339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1160441Z self._precompile_worker() 2025-12-04T10:35:20.1160949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1161095Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1161615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1161781Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1162169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1162371Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1162789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1163078Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1163311Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1163739Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1163819Z ^ 2025-12-04T10:35:20.1164207Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1164212Z 2025-12-04T10:35:20.1164822Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1164827Z 2025-12-04T10:35:20.1164834Z 2025-12-04T10:35:20.1165012Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1165603Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1165610Z 2025-12-04T10:35:20.1165832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1166016Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1166113Z frames [('total', 1)] 2025-12-04T10:35:20.1166209Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1166685Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1166877Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1166960Z graph_break [] 2025-12-04T10:35:20.1167117Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1167298Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1167382Z frames [('total', 1)] 2025-12-04T10:35:20.1167483Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1167671Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1168143Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1168221Z graph_break [] 2025-12-04T10:35:20.1168451Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1168582Z =================================== FAILURES =================================== 2025-12-04T10:35:20.1168821Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1168921Z Traceback (most recent call last): 2025-12-04T10:35:20.1169264Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1169390Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1169817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1170033Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1170473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1170654Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1171087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1171207Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1171683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1172004Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1172460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1172625Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1173042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1173156Z return self._compile_to_module() 2025-12-04T10:35:20.1173574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1173726Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1174166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1174274Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1174719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1174917Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1175419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1175536Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1175979Z File "/tmp/tmpcbu5hy36/qo/cqoehnxgjew6n6n6bk3nvdevhbwxdxvtykfp2p7hz6f2cyn4sbzv.py", line 60, in 2025-12-04T10:35:20.1176387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1176477Z kernel.precompile( 2025-12-04T10:35:20.1176947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1177058Z self._precompile_worker() 2025-12-04T10:35:20.1177566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1177731Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1178247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1178418Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1178903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1179172Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1179547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1179837Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1180033Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1180469Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1180541Z ^ 2025-12-04T10:35:20.1180928Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1180933Z 2025-12-04T10:35:20.1181557Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1181561Z 2025-12-04T10:35:20.1181565Z 2025-12-04T10:35:20.1181749Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1182348Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1182397Z 2025-12-04T10:35:20.1182627Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1182811Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1182936Z frames [('total', 1)] 2025-12-04T10:35:20.1183034Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1183500Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1183699Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1183782Z graph_break [] 2025-12-04T10:35:20.1183939Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1184119Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1184213Z frames [('total', 1)] 2025-12-04T10:35:20.1184313Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1184497Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1184957Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1185038Z graph_break [] 2025-12-04T10:35:20.1185182Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1185368Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1185450Z frames [('total', 1)] 2025-12-04T10:35:20.1185548Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1185739Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1186188Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1186276Z graph_break [] 2025-12-04T10:35:20.1186420Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1186980Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.xml - 2025-12-04T10:35:20.1187124Z =========================== short test summary info ============================ 2025-12-04T10:35:20.1187716Z FAILED [0.4277s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1188250Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1188325Z ^ 2025-12-04T10:35:20.1188712Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1188717Z 2025-12-04T10:35:20.1189335Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1189344Z 2025-12-04T10:35:20.1189348Z 2025-12-04T10:35:20.1189524Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1190117Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1190124Z 2025-12-04T10:35:20.1190345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1190501Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.1190671Z ================== 1 failed, 22 deselected, 2 rerun in 3.06s =================== 2025-12-04T10:35:20.1190751Z Got exit code 1 2025-12-04T10:35:20.1190844Z Retrying single test... 2025-12-04T10:35:20.1191248Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.xml 2025-12-04T10:35:20.1191383Z ============================= test session starts ============================== 2025-12-04T10:35:20.1191725Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.1191816Z cachedir: .pytest_cache 2025-12-04T10:35:20.1192262Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.1192422Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.1192508Z configfile: pytest.ini 2025-12-04T10:35:20.1192973Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.1193156Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.1193669Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1193765Z Running 1 items in this shard 2025-12-04T10:35:20.1193771Z 2025-12-04T10:35:20.1194558Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:05.371903031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1194565Z 2025-12-04T10:35:20.1195013Z [W1204 10:22:14.821799036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1195018Z 2025-12-04T10:35:20.1195489Z [W1204 10:22:14.822031080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1195495Z 2025-12-04T10:35:20.1195944Z [W1204 10:22:14.824322815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1195949Z 2025-12-04T10:35:20.1196377Z [W1204 10:22:14.824508569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1196384Z 2025-12-04T10:35:20.1196812Z [W1204 10:22:14.826608730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1196819Z 2025-12-04T10:35:20.1197246Z [W1204 10:22:14.826890605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1197250Z 2025-12-04T10:35:20.1197758Z [W1204 10:22:14.827053789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1197763Z 2025-12-04T10:35:20.1198201Z [W1204 10:22:14.827462137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1198205Z 2025-12-04T10:35:20.1198635Z [W1204 10:22:14.827634680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1198641Z 2025-12-04T10:35:20.1199073Z [W1204 10:22:14.828094209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1199078Z 2025-12-04T10:35:20.1199504Z [W1204 10:22:14.828265022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1199511Z 2025-12-04T10:35:20.1199940Z [W1204 10:22:14.828612259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1199948Z 2025-12-04T10:35:20.1200377Z [W1204 10:22:14.828789143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1200382Z 2025-12-04T10:35:20.1200811Z [W1204 10:22:14.829109049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1200815Z 2025-12-04T10:35:20.1201285Z [W1204 10:22:14.829278062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1201289Z 2025-12-04T10:35:20.1201717Z [W1204 10:22:14.829598308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1201767Z 2025-12-04T10:35:20.1202200Z [W1204 10:22:14.829764192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1202205Z 2025-12-04T10:35:20.1202321Z ('RERUN', {'yellow': True}) [11.6561s] [100%] 2025-12-04T10:35:20.1203108Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:15.961341030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1203114Z 2025-12-04T10:35:20.1203553Z [W1204 10:22:15.961694337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1203560Z 2025-12-04T10:35:20.1203988Z [W1204 10:22:15.961859811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1203995Z 2025-12-04T10:35:20.1204422Z [W1204 10:22:15.962320090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1204427Z 2025-12-04T10:35:20.1204859Z [W1204 10:22:15.962496853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1204863Z 2025-12-04T10:35:20.1205293Z [W1204 10:22:15.962815409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1205297Z 2025-12-04T10:35:20.1205747Z [W1204 10:22:15.963062644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1205763Z 2025-12-04T10:35:20.1206221Z [W1204 10:22:15.963232837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1206227Z 2025-12-04T10:35:20.1206655Z [W1204 10:22:15.963602535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1206659Z 2025-12-04T10:35:20.1207254Z [W1204 10:22:15.963767098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1207259Z 2025-12-04T10:35:20.1207687Z [W1204 10:22:15.964136065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1207691Z 2025-12-04T10:35:20.1208264Z [W1204 10:22:15.964302658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1208271Z 2025-12-04T10:35:20.1208779Z [W1204 10:22:15.964616875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1208790Z 2025-12-04T10:35:20.1209374Z [W1204 10:22:15.964786198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1209380Z 2025-12-04T10:35:20.1209849Z [W1204 10:22:15.965086194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1209854Z 2025-12-04T10:35:20.1210292Z [W1204 10:22:15.965249777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1210296Z 2025-12-04T10:35:20.1210723Z [W1204 10:22:15.965556343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1210814Z 2025-12-04T10:35:20.1211243Z [W1204 10:22:15.965722326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1211252Z 2025-12-04T10:35:20.1211415Z ('RERUN', {'yellow': True}) [0.6952s] [100%] 2025-12-04T10:35:20.1212201Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:16.662014485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1212206Z 2025-12-04T10:35:20.1212645Z [W1204 10:22:16.662367672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1212649Z 2025-12-04T10:35:20.1213078Z [W1204 10:22:16.662538115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1213082Z 2025-12-04T10:35:20.1213515Z [W1204 10:22:16.663007385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1213522Z 2025-12-04T10:35:20.1213950Z [W1204 10:22:16.663188988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1213956Z 2025-12-04T10:35:20.1214387Z [W1204 10:22:16.663490984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1214391Z 2025-12-04T10:35:20.1214826Z [W1204 10:22:16.663739339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1214830Z 2025-12-04T10:35:20.1215259Z [W1204 10:22:16.663904892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1215271Z 2025-12-04T10:35:20.1215720Z [W1204 10:22:16.664288490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1215730Z 2025-12-04T10:35:20.1216180Z [W1204 10:22:16.664455753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1216186Z 2025-12-04T10:35:20.1216616Z [W1204 10:22:16.664829230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1216620Z 2025-12-04T10:35:20.1217156Z [W1204 10:22:16.664996894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1217162Z 2025-12-04T10:35:20.1217593Z [W1204 10:22:16.665318160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1217597Z 2025-12-04T10:35:20.1218028Z [W1204 10:22:16.665495413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1218035Z 2025-12-04T10:35:20.1218467Z [W1204 10:22:16.666068545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1218474Z 2025-12-04T10:35:20.1218900Z [W1204 10:22:16.666240498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1218904Z 2025-12-04T10:35:20.1219403Z [W1204 10:22:16.667764378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1219408Z 2025-12-04T10:35:20.1219837Z [W1204 10:22:16.667941201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1219842Z 2025-12-04T10:35:20.1220264Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354] Failed to remove temporary cache dir at /tmp/tmpnm6gjk78 2025-12-04T10:35:20.1220651Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354] Traceback (most recent call last): 2025-12-04T10:35:20.1221185Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354] File "/opt/conda/envs/py_3.10/lib/python3.10/shutil.py", line 662, in _rmtree_safe_fd 2025-12-04T10:35:20.1221603Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354] os.rmdir(entry.name, dir_fd=topfd) 2025-12-04T10:35:20.1222288Z W1204 10:22:17.069000 81362 site-packages/torch/_inductor/utils.py:1354] OSError: [Errno 39] Directory not empty: 'D7AGGDEZNIS5BGFNPDIKXLYBRZFN3WBF3ZTPZHJPZOI2QCCFJSMA' 2025-12-04T10:35:20.1222839Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] Failed to remove temporary cache dir at /tmp/tmpnm6gjk78 2025-12-04T10:35:20.1223283Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] Traceback (most recent call last): 2025-12-04T10:35:20.1223861Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] File "/opt/conda/envs/py_3.10/lib/python3.10/shutil.py", line 662, in _rmtree_safe_fd 2025-12-04T10:35:20.1224217Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] os.rmdir(entry.name, dir_fd=topfd) 2025-12-04T10:35:20.1224607Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] OSError: [Errno 39] Directory not empty: 'triton' 2025-12-04T10:35:20.1225025Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] Failed to remove temporary cache dir at /tmp/tmpnm6gjk78 2025-12-04T10:35:20.1225369Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] Traceback (most recent call last): 2025-12-04T10:35:20.1225905Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] File "/opt/conda/envs/py_3.10/lib/python3.10/shutil.py", line 729, in rmtree 2025-12-04T10:35:20.1226175Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] os.rmdir(path) 2025-12-04T10:35:20.1226603Z W1204 10:22:17.070000 81362 site-packages/torch/_inductor/utils.py:1354] OSError: [Errno 39] Directory not empty: '/tmp/tmpnm6gjk78' 2025-12-04T10:35:20.1226684Z FAILED [0.7106s] [100%] 2025-12-04T10:35:20.1226695Z 2025-12-04T10:35:20.1226811Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.1227049Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1227153Z Traceback (most recent call last): 2025-12-04T10:35:20.1227583Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1227703Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1228121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1228338Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1228779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1228937Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1229369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1229488Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1229944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1230215Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1230658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1230778Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1231190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1231334Z return self._compile_to_module() 2025-12-04T10:35:20.1231742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1231927Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1232363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1232480Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1232898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1233088Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1233594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1233699Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1234140Z File "/tmp/tmph0jzyhxz/bd/cbdw6ykoryqwp5jpfuohx52saca4vpiha2kaxsik7mkvlyuo2clb.py", line 193, in 2025-12-04T10:35:20.1234526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1234622Z self._wait_futures(scope) 2025-12-04T10:35:20.1235044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1235143Z kernel = result.result() 2025-12-04T10:35:20.1235513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1235610Z return self.result_fn() 2025-12-04T10:35:20.1236011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1236119Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1236552Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1236557Z 2025-12-04T10:35:20.1236728Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1236836Z Traceback (most recent call last): 2025-12-04T10:35:20.1237291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1237369Z result = job() 2025-12-04T10:35:20.1237959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1238073Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1238549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1238643Z self._precompile_worker() 2025-12-04T10:35:20.1239152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1239301Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1239811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1239981Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1240362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1240566Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1240939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1241219Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1241416Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1241836Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1241903Z ^ 2025-12-04T10:35:20.1242335Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1242340Z 2025-12-04T10:35:20.1242344Z 2025-12-04T10:35:20.1242951Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1242957Z 2025-12-04T10:35:20.1242961Z 2025-12-04T10:35:20.1243143Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1243727Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1243735Z 2025-12-04T10:35:20.1243953Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1244137Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1244218Z frames [('total', 1)] 2025-12-04T10:35:20.1244311Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1244877Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1245067Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1245146Z graph_break [] 2025-12-04T10:35:20.1245290Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1245489Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1246559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1246656Z if out == self.unknown_value: 2025-12-04T10:35:20.1246896Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1246994Z Traceback (most recent call last): 2025-12-04T10:35:20.1247324Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1247443Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1247937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1248156Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1248586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1248744Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1249179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1249301Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1249751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1250027Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1250467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1250588Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1250990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1251086Z return self._compile_to_module() 2025-12-04T10:35:20.1251539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1251672Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1252151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1252255Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1252676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1252869Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1253363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1253463Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1253896Z File "/tmp/tmp9zfsqhq3/e3/ce3pmk2zq62c6ibr5psczwpukuth2alrinz5z45k6dfg4uy46ltw.py", line 193, in 2025-12-04T10:35:20.1254284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1254388Z self._wait_futures(scope) 2025-12-04T10:35:20.1254808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1254900Z kernel = result.result() 2025-12-04T10:35:20.1255283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1255372Z return self.result_fn() 2025-12-04T10:35:20.1255774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1255885Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1256211Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1256218Z 2025-12-04T10:35:20.1256391Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1256491Z Traceback (most recent call last): 2025-12-04T10:35:20.1256949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1257029Z result = job() 2025-12-04T10:35:20.1257528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1257731Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1258198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1258291Z self._precompile_worker() 2025-12-04T10:35:20.1258804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1258955Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1259551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1259735Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1260140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1260365Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1260766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1261066Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1261234Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1261681Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1261828Z ^ 2025-12-04T10:35:20.1262213Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1262257Z 2025-12-04T10:35:20.1262262Z 2025-12-04T10:35:20.1262865Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1262874Z 2025-12-04T10:35:20.1262878Z 2025-12-04T10:35:20.1263062Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1263649Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1263654Z 2025-12-04T10:35:20.1263878Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1264056Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1264148Z frames [('total', 1)] 2025-12-04T10:35:20.1264240Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1264801Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1264996Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1265077Z graph_break [] 2025-12-04T10:35:20.1265226Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1265404Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1266447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1266548Z if out == self.unknown_value: 2025-12-04T10:35:20.1266722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1266802Z frames [('total', 1)] 2025-12-04T10:35:20.1266896Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1267081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1267638Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1267823Z graph_break [] 2025-12-04T10:35:20.1267969Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1268094Z =================================== FAILURES =================================== 2025-12-04T10:35:20.1268328Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1268429Z Traceback (most recent call last): 2025-12-04T10:35:20.1268765Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1268879Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1269291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1269504Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1269936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1270100Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1270531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1270648Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1271101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1271415Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1271858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1272019Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1272422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1272522Z return self._compile_to_module() 2025-12-04T10:35:20.1272933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1273065Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1273505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1273608Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1274034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1274228Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1274727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1274841Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1275279Z File "/tmp/tmpnm6gjk78/vl/cvlsw3ncyp3l7ltyeutz2mire53d3lqdt5pm63pej6dotq6ssgnm.py", line 193, in 2025-12-04T10:35:20.1275662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1275753Z self._wait_futures(scope) 2025-12-04T10:35:20.1276168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1276266Z kernel = result.result() 2025-12-04T10:35:20.1276636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1276724Z return self.result_fn() 2025-12-04T10:35:20.1277135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1277238Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1277563Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1277657Z 2025-12-04T10:35:20.1277832Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1277927Z Traceback (most recent call last): 2025-12-04T10:35:20.1278385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1278460Z result = job() 2025-12-04T10:35:20.1278961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1279078Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1279544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1279644Z self._precompile_worker() 2025-12-04T10:35:20.1280145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1280294Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1280798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1280960Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1281342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1281593Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1281963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1282285Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1282442Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1282865Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1282934Z ^ 2025-12-04T10:35:20.1283318Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1283323Z 2025-12-04T10:35:20.1283327Z 2025-12-04T10:35:20.1283933Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1283941Z 2025-12-04T10:35:20.1283945Z 2025-12-04T10:35:20.1284121Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1284714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1284718Z 2025-12-04T10:35:20.1284940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1285123Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1285211Z frames [('total', 1)] 2025-12-04T10:35:20.1285303Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1285869Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1286052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1286130Z graph_break [] 2025-12-04T10:35:20.1286275Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1286446Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1287486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1287662Z if out == self.unknown_value: 2025-12-04T10:35:20.1287842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1287929Z frames [('total', 1)] 2025-12-04T10:35:20.1288023Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1288206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1288767Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1288845Z graph_break [] 2025-12-04T10:35:20.1288995Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1289172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1289249Z frames [('total', 1)] 2025-12-04T10:35:20.1289344Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1289527Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1290085Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1290172Z graph_break [] 2025-12-04T10:35:20.1290312Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1290871Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.xml - 2025-12-04T10:35:20.1291057Z =========================== short test summary info ============================ 2025-12-04T10:35:20.1291792Z FAILED [0.7106s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1291837Z 2025-12-04T10:35:20.1292010Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1292114Z Traceback (most recent call last): 2025-12-04T10:35:20.1292582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1292661Z result = job() 2025-12-04T10:35:20.1293165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1293283Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1293761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1293855Z self._precompile_worker() 2025-12-04T10:35:20.1294363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1294507Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1295017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1295179Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1295578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1295817Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1296190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1296475Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1296629Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1297043Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1297118Z ^ 2025-12-04T10:35:20.1297611Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1297616Z 2025-12-04T10:35:20.1297620Z 2025-12-04T10:35:20.1298226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1298230Z 2025-12-04T10:35:20.1298234Z 2025-12-04T10:35:20.1298414Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1298996Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1299010Z 2025-12-04T10:35:20.1299300Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1299448Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.1299629Z ================= 1 failed, 187 deselected, 2 rerun in 13.10s ================== 2025-12-04T10:35:20.1299710Z Got exit code 1 2025-12-04T10:35:20.1299793Z Retrying single test... 2025-12-04T10:35:20.1300193Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.xml 2025-12-04T10:35:20.1300324Z ============================= test session starts ============================== 2025-12-04T10:35:20.1300618Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.1300749Z cachedir: .pytest_cache 2025-12-04T10:35:20.1301197Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.1301341Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.1301426Z configfile: pytest.ini 2025-12-04T10:35:20.1301883Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.1302074Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.1302588Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1302687Z Running 1 items in this shard 2025-12-04T10:35:20.1302691Z 2025-12-04T10:35:20.1303482Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:25.871446482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1303490Z 2025-12-04T10:35:20.1303922Z [W1204 10:22:35.228605735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1303934Z 2025-12-04T10:35:20.1304366Z [W1204 10:22:35.228842229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1304370Z 2025-12-04T10:35:20.1304803Z [W1204 10:22:35.231152875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1304808Z 2025-12-04T10:35:20.1305240Z [W1204 10:22:35.231348598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1305244Z 2025-12-04T10:35:20.1305696Z [W1204 10:22:35.233470860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1305703Z 2025-12-04T10:35:20.1306167Z [W1204 10:22:35.233758856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1306174Z 2025-12-04T10:35:20.1306599Z [W1204 10:22:35.233925249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1306603Z 2025-12-04T10:35:20.1307111Z [W1204 10:22:35.234331347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1307115Z 2025-12-04T10:35:20.1307542Z [W1204 10:22:35.234506480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1307547Z 2025-12-04T10:35:20.1312697Z [W1204 10:22:35.234967919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1312711Z 2025-12-04T10:35:20.1313171Z [W1204 10:22:35.235146043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1313177Z 2025-12-04T10:35:20.1313615Z [W1204 10:22:35.235497820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1313625Z 2025-12-04T10:35:20.1314066Z [W1204 10:22:35.235664433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1314071Z 2025-12-04T10:35:20.1314503Z [W1204 10:22:35.235975139 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1314508Z 2025-12-04T10:35:20.1314945Z [W1204 10:22:35.236141092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1315051Z 2025-12-04T10:35:20.1315484Z [W1204 10:22:35.236459698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1315554Z 2025-12-04T10:35:20.1315997Z [W1204 10:22:35.236630952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1316001Z 2025-12-04T10:35:20.1316117Z ('RERUN', {'yellow': True}) [11.5590s] [100%] 2025-12-04T10:35:20.1316923Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:36.367496585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1316928Z 2025-12-04T10:35:20.1317359Z [W1204 10:22:36.367839272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1317367Z 2025-12-04T10:35:20.1317796Z [W1204 10:22:36.368005475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1317810Z 2025-12-04T10:35:20.1318241Z [W1204 10:22:36.368473554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1318247Z 2025-12-04T10:35:20.1318677Z [W1204 10:22:36.368653488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1318685Z 2025-12-04T10:35:20.1319120Z [W1204 10:22:36.368947363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1319125Z 2025-12-04T10:35:20.1319555Z [W1204 10:22:36.369187588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1319560Z 2025-12-04T10:35:20.1319993Z [W1204 10:22:36.369350051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1319997Z 2025-12-04T10:35:20.1320430Z [W1204 10:22:36.369718498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1320436Z 2025-12-04T10:35:20.1320871Z [W1204 10:22:36.369886042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1320876Z 2025-12-04T10:35:20.1321409Z [W1204 10:22:36.370293530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1321414Z 2025-12-04T10:35:20.1321855Z [W1204 10:22:36.370466563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1321860Z 2025-12-04T10:35:20.1322289Z [W1204 10:22:36.370789689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1322296Z 2025-12-04T10:35:20.1322731Z [W1204 10:22:36.370955243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1322737Z 2025-12-04T10:35:20.1323175Z [W1204 10:22:36.371268269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1323179Z 2025-12-04T10:35:20.1323609Z [W1204 10:22:36.371433512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1323614Z 2025-12-04T10:35:20.1324047Z [W1204 10:22:36.371742558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1324052Z 2025-12-04T10:35:20.1324483Z [W1204 10:22:36.371906691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1324528Z 2025-12-04T10:35:20.1324643Z ('RERUN', {'yellow': True}) [0.6935s] [100%] 2025-12-04T10:35:20.1325442Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 10:22:37.062598464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1325495Z 2025-12-04T10:35:20.1325967Z [W1204 10:22:37.062947151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1325971Z 2025-12-04T10:35:20.1326501Z [W1204 10:22:37.063114814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1326506Z 2025-12-04T10:35:20.1326935Z [W1204 10:22:37.063590864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1326950Z 2025-12-04T10:35:20.1327377Z [W1204 10:22:37.063764207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1327387Z 2025-12-04T10:35:20.1327811Z [W1204 10:22:37.064056153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1327815Z 2025-12-04T10:35:20.1328256Z [W1204 10:22:37.064297917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1328261Z 2025-12-04T10:35:20.1328690Z [W1204 10:22:37.064458880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1328694Z 2025-12-04T10:35:20.1329131Z [W1204 10:22:37.064820248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1329136Z 2025-12-04T10:35:20.1329566Z [W1204 10:22:37.064988661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1329570Z 2025-12-04T10:35:20.1330009Z [W1204 10:22:37.065360208 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1330013Z 2025-12-04T10:35:20.1330441Z [W1204 10:22:37.065530101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1330528Z 2025-12-04T10:35:20.1330959Z [W1204 10:22:37.065848088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1330968Z 2025-12-04T10:35:20.1331398Z [W1204 10:22:37.066014671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1331405Z 2025-12-04T10:35:20.1331831Z [W1204 10:22:37.066324457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1331835Z 2025-12-04T10:35:20.1332267Z [W1204 10:22:37.066491010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1332273Z 2025-12-04T10:35:20.1332701Z [W1204 10:22:37.066797916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1332713Z 2025-12-04T10:35:20.1333144Z [W1204 10:22:37.066964219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:35:20.1333148Z 2025-12-04T10:35:20.1333232Z FAILED [0.7054s] [100%] 2025-12-04T10:35:20.1333236Z 2025-12-04T10:35:20.1333363Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.1333644Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1333750Z Traceback (most recent call last): 2025-12-04T10:35:20.1334091Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1334251Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1334673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1334893Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1335338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1335507Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1335940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1336061Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1336518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1336788Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1337234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1337355Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1337764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1337870Z return self._compile_to_module() 2025-12-04T10:35:20.1338283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1338417Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1338858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1338960Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1339441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1339638Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1340244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1340356Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1340792Z File "/tmp/tmpmamsjdzh/qd/cqd2dgft4negc555emrg7ptbhsxbirtnuwo2gvd4cvb77fi6j57d.py", line 193, in 2025-12-04T10:35:20.1341179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1341271Z self._wait_futures(scope) 2025-12-04T10:35:20.1341691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1341792Z kernel = result.result() 2025-12-04T10:35:20.1342168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1342261Z return self.result_fn() 2025-12-04T10:35:20.1342669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1342778Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1343110Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1343115Z 2025-12-04T10:35:20.1343287Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1343389Z Traceback (most recent call last): 2025-12-04T10:35:20.1343855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1343984Z result = job() 2025-12-04T10:35:20.1344492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1344647Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1345117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1345217Z self._precompile_worker() 2025-12-04T10:35:20.1345768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1345924Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1346434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1346601Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1346988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1347193Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1347573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1347857Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1348021Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1348448Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1348519Z ^ 2025-12-04T10:35:20.1348911Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1348918Z 2025-12-04T10:35:20.1348922Z 2025-12-04T10:35:20.1349539Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1349547Z 2025-12-04T10:35:20.1349550Z 2025-12-04T10:35:20.1349736Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1350332Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1350421Z 2025-12-04T10:35:20.1350646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1350825Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1350920Z frames [('total', 1)] 2025-12-04T10:35:20.1351012Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1351580Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1351774Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1351855Z graph_break [] 2025-12-04T10:35:20.1352006Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1352188Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1353240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1353337Z if out == self.unknown_value: 2025-12-04T10:35:20.1353579Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1353691Z Traceback (most recent call last): 2025-12-04T10:35:20.1354072Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1354188Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1354601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1354849Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1355293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1355469Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1355945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1356072Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1356530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1356813Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1357251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1357378Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1357792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1357891Z return self._compile_to_module() 2025-12-04T10:35:20.1358302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1358441Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1358878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1358992Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1359412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1359603Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1360108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1360216Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1360735Z File "/tmp/tmpfustvwn0/74/c74sqjwxqkvttc5lr25tjtrfqrqrmohcrz2aeb3bsqq3sv4tobqm.py", line 193, in 2025-12-04T10:35:20.1361117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1361211Z self._wait_futures(scope) 2025-12-04T10:35:20.1361632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1361732Z kernel = result.result() 2025-12-04T10:35:20.1362102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1362195Z return self.result_fn() 2025-12-04T10:35:20.1362600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1362710Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1363037Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1363049Z 2025-12-04T10:35:20.1363220Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1363322Z Traceback (most recent call last): 2025-12-04T10:35:20.1363779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1363861Z result = job() 2025-12-04T10:35:20.1364360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1364521Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1365003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1365139Z self._precompile_worker() 2025-12-04T10:35:20.1365676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1365850Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1366352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1366519Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1366896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1367101Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1367478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1367762Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1367922Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1368340Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1368406Z ^ 2025-12-04T10:35:20.1368795Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1368800Z 2025-12-04T10:35:20.1368804Z 2025-12-04T10:35:20.1369406Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1369413Z 2025-12-04T10:35:20.1369417Z 2025-12-04T10:35:20.1369601Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1370194Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1370198Z 2025-12-04T10:35:20.1370422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1370769Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1370853Z frames [('total', 1)] 2025-12-04T10:35:20.1370954Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1371517Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1371712Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1371795Z graph_break [] 2025-12-04T10:35:20.1371938Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1372112Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1373162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1373264Z if out == self.unknown_value: 2025-12-04T10:35:20.1373449Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1373534Z frames [('total', 1)] 2025-12-04T10:35:20.1373631Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1373823Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1374390Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1374518Z graph_break [] 2025-12-04T10:35:20.1374669Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1374852Z =================================== FAILURES =================================== 2025-12-04T10:35:20.1375098Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T10:35:20.1375203Z Traceback (most recent call last): 2025-12-04T10:35:20.1375543Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T10:35:20.1375687Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T10:35:20.1376129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1376346Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1376781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1376943Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1377378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1377500Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1377959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1378234Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1378678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1378804Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1379279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1379378Z return self._compile_to_module() 2025-12-04T10:35:20.1379800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1379935Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1380380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1380571Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1380992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1381188Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1381684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1381793Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1382227Z File "/tmp/tmpv0eg6o2x/jh/cjhw5ltfklcgqyvrv5j2bokd42ir36gfz77m6gmqk6yep6vcej2y.py", line 193, in 2025-12-04T10:35:20.1382612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.1382710Z self._wait_futures(scope) 2025-12-04T10:35:20.1383130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.1383220Z kernel = result.result() 2025-12-04T10:35:20.1383594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.1383683Z return self.result_fn() 2025-12-04T10:35:20.1384094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.1384247Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.1384573Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1384578Z 2025-12-04T10:35:20.1384758Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1384894Z Traceback (most recent call last): 2025-12-04T10:35:20.1385358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1385455Z result = job() 2025-12-04T10:35:20.1386003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1386124Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1386594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1386686Z self._precompile_worker() 2025-12-04T10:35:20.1387202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1387346Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1387863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1388026Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1388412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1388630Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1389001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1389282Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1389448Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1389865Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1389944Z ^ 2025-12-04T10:35:20.1390331Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1390336Z 2025-12-04T10:35:20.1390340Z 2025-12-04T10:35:20.1391025Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1391037Z 2025-12-04T10:35:20.1391041Z 2025-12-04T10:35:20.1391220Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1391810Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1391818Z 2025-12-04T10:35:20.1392048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1392229Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1392318Z frames [('total', 1)] 2025-12-04T10:35:20.1392412Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1392975Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1393174Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1393252Z graph_break [] 2025-12-04T10:35:20.1393398Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1393576Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:35:20.1394617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:35:20.1394761Z if out == self.unknown_value: 2025-12-04T10:35:20.1394973Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1395051Z frames [('total', 1)] 2025-12-04T10:35:20.1395151Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1395334Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1395956Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1396033Z graph_break [] 2025-12-04T10:35:20.1396175Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1396360Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1396446Z frames [('total', 1)] 2025-12-04T10:35:20.1396540Z stats [('calls_captured', 11)] 2025-12-04T10:35:20.1396730Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1397286Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.1397372Z graph_break [] 2025-12-04T10:35:20.1397522Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T10:35:20.1398080Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.xml - 2025-12-04T10:35:20.1398226Z =========================== short test summary info ============================ 2025-12-04T10:35:20.1398964Z FAILED [0.7054s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.1398972Z 2025-12-04T10:35:20.1399145Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T10:35:20.1399245Z Traceback (most recent call last): 2025-12-04T10:35:20.1399715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.1399801Z result = job() 2025-12-04T10:35:20.1400309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.1400508Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.1400990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.1401087Z self._precompile_worker() 2025-12-04T10:35:20.1401602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1401754Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1402256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1402438Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1402823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1403038Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1403417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1403700Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1403859Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1404275Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1404387Z ^ 2025-12-04T10:35:20.1404783Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1404826Z 2025-12-04T10:35:20.1404830Z 2025-12-04T10:35:20.1405434Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1405439Z 2025-12-04T10:35:20.1405442Z 2025-12-04T10:35:20.1405640Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1406227Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1406232Z 2025-12-04T10:35:20.1406462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1406616Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.1406788Z ================= 1 failed, 187 deselected, 2 rerun in 12.99s ================== 2025-12-04T10:35:20.1406874Z Got exit code 1 2025-12-04T10:35:20.1407260Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T10:35:20.1407615Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.1408176Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.xml 2025-12-04T10:35:20.1408316Z ============================= test session starts ============================== 2025-12-04T10:35:20.1408614Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.1408705Z cachedir: .pytest_cache 2025-12-04T10:35:20.1409150Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.1409267Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.1409358Z configfile: pytest.ini 2025-12-04T10:35:20.1409819Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.1410012Z collecting ... collected 188 items / 23 deselected / 165 selected 2025-12-04T10:35:20.1410132Z stepcurrent: skipping 23 already run items. 2025-12-04T10:35:20.1410353Z Running 165 items in this shard 2025-12-04T10:35:20.1410358Z 2025-12-04T10:35:20.1411147Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 0%] 2025-12-04T10:35:20.1411928Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 1%] 2025-12-04T10:35:20.1412695Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 1%] 2025-12-04T10:35:20.1413456Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 2%] 2025-12-04T10:35:20.1414688Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1415656Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1416167Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1416535Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1416983Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1417368Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1417819Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1418281Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1418778Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1419333Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1419808Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1420178Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1420615Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1421011Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1421402Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1421777Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1422326Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1422856Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1423325Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1423760Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1424256Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1424712Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1425199Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1425698Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1426180Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1426628Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1427098Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1427548Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1427949Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1428365Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1428863Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1429318Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1429804Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1430208Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1430586Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1431002Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1431374Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1431784Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1432232Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1432648Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1433074Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1433582Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1434147Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1434692Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1435093Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1435466Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1435950Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1436315Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1436809Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1437259Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1437696Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1438337Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1438976Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1439277Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1441077Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1441534Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1442429Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1442963Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1443716Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1444290Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1445038Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1445811Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1446343Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1447272Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1447588Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1448350Z E1204 10:22:46.660000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1448461Z ('RERUN', {'yellow': True}) [1.7564s] [ 3%] 2025-12-04T10:35:20.1449695Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1450616Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1451020Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1451424Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1451865Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1452254Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1452704Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1453163Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1453655Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1454159Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1454635Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1455000Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1455441Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1455883Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1456270Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1456646Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1457189Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1457739Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1458199Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1458626Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1459165Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1459616Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1460104Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1460556Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1461034Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1461485Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1461961Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1462369Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1462808Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1463217Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1463712Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1464164Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1464650Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1465048Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1465418Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1465840Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1466210Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1466608Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1467051Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1467454Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1467877Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1468374Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1468935Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1469469Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1469872Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1470245Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1470728Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1471093Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1471576Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1472021Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1472449Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1473093Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1473685Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1474026Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1475861Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1476320Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1477209Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1477742Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1478491Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1479064Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1479809Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1480534Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1481058Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1481981Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1482286Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1483043Z E1204 10:22:46.999000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1483148Z ('RERUN', {'yellow': True}) [0.3115s] [ 3%] 2025-12-04T10:35:20.1484375Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1485297Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1485703Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1486144Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1486600Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1486984Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1487433Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1487886Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1488376Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1488872Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1489341Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1489705Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1490144Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1490537Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1490924Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1491294Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1491835Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1492361Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1492821Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1493246Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1493734Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1494180Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1494671Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1495123Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1495603Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1496049Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1496521Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1496926Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1497365Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1497772Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1498265Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1498720Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1499271Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1499667Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1500034Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1500445Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1500812Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1501210Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1501652Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1502055Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1502478Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1502973Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1503563Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1504098Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1504500Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1504870Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1505353Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1505768Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1506249Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1506697Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1507127Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1508035Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1508630Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1508998Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1510779Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1511236Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1512120Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1512651Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1513406Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1513978Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1514727Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1515492Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1516050Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1516973Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1517276Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1518039Z E1204 10:22:47.312000 82197 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1518120Z FAILED [0.3105s] [ 3%] 2025-12-04T10:35:20.1518125Z 2025-12-04T10:35:20.1518250Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.1518582Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1518680Z Traceback (most recent call last): 2025-12-04T10:35:20.1519036Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1519230Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1519705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1519910Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1520382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1520546Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1520978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1521097Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1521549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1521819Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1522264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1522384Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1522788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1522888Z return self._compile_to_module() 2025-12-04T10:35:20.1523298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1523435Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1523868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1523973Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1524392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1524584Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1525082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1525186Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1525616Z File "/tmp/tmpcnc9szkk/wj/cwjkbv2lw3skfclmw777nmxunwaxbevv7qru57jg4as3bbpjft7k.py", line 74, in 2025-12-04T10:35:20.1526088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1526177Z kernel.precompile( 2025-12-04T10:35:20.1526644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1526743Z self._precompile_worker() 2025-12-04T10:35:20.1527246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1527398Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1527901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1528067Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1528446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1528655Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1529027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1529307Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1529496Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1530094Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1530202Z ^ 2025-12-04T10:35:20.1530591Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1530596Z 2025-12-04T10:35:20.1531207Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1531212Z 2025-12-04T10:35:20.1531216Z 2025-12-04T10:35:20.1531393Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1532135Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1532143Z 2025-12-04T10:35:20.1532363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1532543Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1532626Z frames [('total', 1)] 2025-12-04T10:35:20.1532719Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1533122Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1533307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1533388Z graph_break [] 2025-12-04T10:35:20.1533715Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1533814Z Traceback (most recent call last): 2025-12-04T10:35:20.1534171Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1534362Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1534775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1534986Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1535422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1535583Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1536144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1536263Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1536720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1536988Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1537431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1537550Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1537954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1538054Z return self._compile_to_module() 2025-12-04T10:35:20.1538464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1538597Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1539095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1539197Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1539620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1539859Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1540353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1540600Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1541027Z File "/tmp/tmpam99xy4e/4s/c4s34qh73ne2mlkpkfnfdlwirckkgehnjms2cbxbkqelul36m5wz.py", line 74, in 2025-12-04T10:35:20.1541424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1541510Z kernel.precompile( 2025-12-04T10:35:20.1541980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1542076Z self._precompile_worker() 2025-12-04T10:35:20.1542578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1542726Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1543231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1543396Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1543777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1543982Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1544351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1544636Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1544824Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1545383Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1545458Z ^ 2025-12-04T10:35:20.1545886Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1545892Z 2025-12-04T10:35:20.1546585Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1546591Z 2025-12-04T10:35:20.1546595Z 2025-12-04T10:35:20.1546772Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1547507Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1547514Z 2025-12-04T10:35:20.1547734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1547910Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1547995Z frames [('total', 1)] 2025-12-04T10:35:20.1548087Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1548488Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1548673Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1548748Z graph_break [] 2025-12-04T10:35:20.1548926Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1549003Z frames [('total', 1)] 2025-12-04T10:35:20.1549094Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1549276Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1549666Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1549793Z graph_break [] 2025-12-04T10:35:20.1549917Z =================================== FAILURES =================================== 2025-12-04T10:35:20.1550372Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1550520Z Traceback (most recent call last): 2025-12-04T10:35:20.1550983Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1551247Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1551742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1551952Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1552390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1552554Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1552985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1553110Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1553557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1553834Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1554273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1554389Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1554793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1554892Z return self._compile_to_module() 2025-12-04T10:35:20.1555298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1555437Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1555874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1555980Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1556492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1556686Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1557187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1557288Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1557721Z File "/tmp/tmppr6xkt76/7c/c7cvhkxncxlxgqtbu2pkgagh6dfdr57u2aj6y5gtjkz723m3hp2g.py", line 74, in 2025-12-04T10:35:20.1558108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1558196Z kernel.precompile( 2025-12-04T10:35:20.1558666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1558757Z self._precompile_worker() 2025-12-04T10:35:20.1559266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1559415Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1559917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1560126Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1560501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1560702Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1561118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1561398Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1561600Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1562150Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1562218Z ^ 2025-12-04T10:35:20.1562610Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1562620Z 2025-12-04T10:35:20.1563222Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1563229Z 2025-12-04T10:35:20.1563233Z 2025-12-04T10:35:20.1563413Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1564148Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1564153Z 2025-12-04T10:35:20.1564471Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1564650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1564729Z frames [('total', 1)] 2025-12-04T10:35:20.1564826Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1565225Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1565433Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1565525Z graph_break [] 2025-12-04T10:35:20.1565720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1565801Z frames [('total', 1)] 2025-12-04T10:35:20.1565903Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1566082Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1566564Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1566641Z graph_break [] 2025-12-04T10:35:20.1566818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1566905Z frames [('total', 1)] 2025-12-04T10:35:20.1566994Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1567178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1567570Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1567648Z graph_break [] 2025-12-04T10:35:20.1568207Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.xml - 2025-12-04T10:35:20.1568350Z =========================== short test summary info ============================ 2025-12-04T10:35:20.1569067Z FAILED [0.3105s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1569619Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1569729Z ^ 2025-12-04T10:35:20.1570118Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1570123Z 2025-12-04T10:35:20.1570724Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1570771Z 2025-12-04T10:35:20.1570775Z 2025-12-04T10:35:20.1570961Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1571693Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1571698Z 2025-12-04T10:35:20.1571920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1572072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.1572254Z ============= 1 failed, 4 skipped, 23 deselected, 2 rerun in 2.42s ============= 2025-12-04T10:35:20.1572331Z Got exit code 1 2025-12-04T10:35:20.1572418Z Retrying single test... 2025-12-04T10:35:20.1572819Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.xml 2025-12-04T10:35:20.1572959Z ============================= test session starts ============================== 2025-12-04T10:35:20.1573248Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.1573337Z cachedir: .pytest_cache 2025-12-04T10:35:20.1573785Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.1573883Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.1573967Z configfile: pytest.ini 2025-12-04T10:35:20.1574427Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.1574615Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.1575285Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1575379Z Running 1 items in this shard 2025-12-04T10:35:20.1575385Z 2025-12-04T10:35:20.1576745Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1577672Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1578029Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1578404Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1578836Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1579298Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1579747Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1580198Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1584323Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1584823Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1585373Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1585794Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1586233Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1586638Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1587021Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1587401Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1587949Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1588401Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1588859Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1589283Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1589776Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1590226Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1590721Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1591251Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1591730Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1592183Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1592609Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1593031Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1593439Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1593838Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1594339Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1594792Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1595281Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1595722Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1596083Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1596539Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1596909Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1597316Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1597759Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1598158Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1598596Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1599089Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1599583Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1600118Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1600525Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1600895Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1601379Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1601751Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1602231Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1602767Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1603199Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1603794Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1604393Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1604698Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1606553Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1607048Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1608147Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1608843Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1609894Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1610475Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1611224Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1611880Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1612397Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1613328Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1613633Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1614395Z E1204 10:22:57.378000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1614509Z ('RERUN', {'yellow': True}) [1.7509s] [100%] 2025-12-04T10:35:20.1615869Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1616799Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1617160Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1617532Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1617967Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1618360Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1618809Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1619338Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1619897Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1620386Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1620899Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1621269Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1621706Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1622105Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1622484Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1622860Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1623400Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1623840Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1624313Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1624736Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1625230Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1625680Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1626172Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1626618Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1627679Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1628144Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1628573Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1628993Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1629393Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1629795Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1630301Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1630751Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1631241Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1631684Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1632047Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1632506Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1632881Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1633284Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1633727Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1634126Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1634557Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1635051Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1635543Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1636081Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1636481Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1636861Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1637345Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1637715Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1638194Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1638727Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1639159Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1639752Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1640352Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1640653Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1642436Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1642928Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1643816Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1644395Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1645152Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1645751Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1646519Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1647182Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1647703Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1648633Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1648939Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1649709Z E1204 10:22:57.722000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1649820Z ('RERUN', {'yellow': True}) [0.3103s] [100%] 2025-12-04T10:35:20.1651122Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1652049Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1652505Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1652879Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1653315Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1653708Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1654157Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1654610Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1655104Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1655663Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1656204Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1656575Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1657010Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1657408Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1657790Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1658167Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1658710Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1659252Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1659721Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1660146Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1660637Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1661090Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1661577Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1662030Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1662592Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1663047Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1663475Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1663887Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1664293Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1664699Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1665201Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1665661Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1666149Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1666590Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1666954Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1667412Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1667779Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1668187Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1668633Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1669036Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1669474Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1669971Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1670461Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1671004Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1671406Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1671779Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1672263Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1672642Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1673123Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1673671Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1674109Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1674702Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1675301Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1675603Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1677399Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1677896Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1678793Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1679369Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1680121Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1680703Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1681453Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1682112Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1682632Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1683570Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1683876Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1684634Z E1204 10:22:58.033000 82378 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1684723Z FAILED [0.3099s] [100%] 2025-12-04T10:35:20.1684728Z 2025-12-04T10:35:20.1684845Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.1685264Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1685367Z Traceback (most recent call last): 2025-12-04T10:35:20.1685758Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1685979Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1686393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1686613Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1687050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1687211Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1687657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1687784Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1688248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1688521Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1688965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1689139Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1689550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1689690Z return self._compile_to_module() 2025-12-04T10:35:20.1690108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1690249Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1690702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1690812Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1691232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1691433Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1691937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1692049Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1692491Z File "/tmp/tmpjlk1b9r6/we/cwegshnuygtwiswwxzaf2pjal5zweorw4eqvim6llayn5yzsw7x3.py", line 74, in 2025-12-04T10:35:20.1692885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1692985Z kernel.precompile( 2025-12-04T10:35:20.1693455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1693552Z self._precompile_worker() 2025-12-04T10:35:20.1694060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1694209Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1694718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1694885Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1695264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1695475Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1695977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1696273Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1696472Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1697025Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1697111Z ^ 2025-12-04T10:35:20.1697505Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1697512Z 2025-12-04T10:35:20.1698124Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1698129Z 2025-12-04T10:35:20.1698133Z 2025-12-04T10:35:20.1698319Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1699110Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1699120Z 2025-12-04T10:35:20.1699344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1699569Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1699663Z frames [('total', 1)] 2025-12-04T10:35:20.1699758Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1700162Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1700395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1700475Z graph_break [] 2025-12-04T10:35:20.1700813Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1700917Z Traceback (most recent call last): 2025-12-04T10:35:20.1701274Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1701481Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1701891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1702100Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1702542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1702702Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1703137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1703260Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1703709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1703985Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1704420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1704545Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1704961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1705063Z return self._compile_to_module() 2025-12-04T10:35:20.1705477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1705610Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1706126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1706238Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1706655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1706851Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1707348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1707453Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1708040Z File "/tmp/tmp9y6bp0ro/xt/cxtfgml7cumopkfnyygflis3np74vyxgthfe6e2vihzv2h2hmbwk.py", line 74, in 2025-12-04T10:35:20.1708432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1708530Z kernel.precompile( 2025-12-04T10:35:20.1709006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1709099Z self._precompile_worker() 2025-12-04T10:35:20.1709615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1709839Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1710344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1710517Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1710979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1711199Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1711578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1711858Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1712056Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1712616Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1712699Z ^ 2025-12-04T10:35:20.1713089Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1713097Z 2025-12-04T10:35:20.1713697Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1713702Z 2025-12-04T10:35:20.1713706Z 2025-12-04T10:35:20.1713897Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1714632Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1714638Z 2025-12-04T10:35:20.1714870Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1715055Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1715140Z frames [('total', 1)] 2025-12-04T10:35:20.1715243Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1715683Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1715887Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1715970Z graph_break [] 2025-12-04T10:35:20.1716259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1716349Z frames [('total', 1)] 2025-12-04T10:35:20.1716440Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1716625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1717026Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1717106Z graph_break [] 2025-12-04T10:35:20.1717234Z =================================== FAILURES =================================== 2025-12-04T10:35:20.1717565Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1717669Z Traceback (most recent call last): 2025-12-04T10:35:20.1718038Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1718235Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1718658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1718870Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1719305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1719471Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1720026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1720145Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1720642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1720910Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1721366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1721488Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1721894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1722004Z return self._compile_to_module() 2025-12-04T10:35:20.1722417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1722556Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1722996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1723106Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1723535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1723732Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1724228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1724336Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1724744Z File "/tmp/tmp42c1_5rp/u4/cu4omle2eh76yjdjzlb4zy2vipe7e6uz5ek2bfltn36tqjrkzszq.py", line 74, in 2025-12-04T10:35:20.1725152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1725240Z kernel.precompile( 2025-12-04T10:35:20.1725717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1725817Z self._precompile_worker() 2025-12-04T10:35:20.1726325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1726554Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1727065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1727236Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1727619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1727828Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1728199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1728491Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1728681Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1729240Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1729309Z ^ 2025-12-04T10:35:20.1729700Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1729705Z 2025-12-04T10:35:20.1730318Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1730366Z 2025-12-04T10:35:20.1730370Z 2025-12-04T10:35:20.1730550Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1731333Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1731338Z 2025-12-04T10:35:20.1731567Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1731745Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1731832Z frames [('total', 1)] 2025-12-04T10:35:20.1731928Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1732326Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1732514Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1732593Z graph_break [] 2025-12-04T10:35:20.1732773Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1732856Z frames [('total', 1)] 2025-12-04T10:35:20.1732948Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1733141Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1733536Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1733618Z graph_break [] 2025-12-04T10:35:20.1733792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1733871Z frames [('total', 1)] 2025-12-04T10:35:20.1733965Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1734149Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1734545Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1734626Z graph_break [] 2025-12-04T10:35:20.1735180Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.xml - 2025-12-04T10:35:20.1735326Z =========================== short test summary info ============================ 2025-12-04T10:35:20.1736177Z FAILED [0.3099s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1736728Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1736800Z ^ 2025-12-04T10:35:20.1737187Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1737194Z 2025-12-04T10:35:20.1737801Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1737808Z 2025-12-04T10:35:20.1737811Z 2025-12-04T10:35:20.1737986Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1738723Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1738733Z 2025-12-04T10:35:20.1738954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1739167Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.1739339Z ================== 1 failed, 187 deselected, 2 rerun in 2.41s ================== 2025-12-04T10:35:20.1739416Z Got exit code 1 2025-12-04T10:35:20.1739545Z Retrying single test... 2025-12-04T10:35:20.1739949Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.xml 2025-12-04T10:35:20.1740080Z ============================= test session starts ============================== 2025-12-04T10:35:20.1740412Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.1740497Z cachedir: .pytest_cache 2025-12-04T10:35:20.1740950Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.1741056Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.1741141Z configfile: pytest.ini 2025-12-04T10:35:20.1741599Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.1741788Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.1742450Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1742549Z Running 1 items in this shard 2025-12-04T10:35:20.1742556Z 2025-12-04T10:35:20.1743785Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1744715Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1745074Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1745445Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1745933Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1746320Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1746865Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1747319Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1747808Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1748307Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1748772Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1749149Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1749591Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1749985Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1750374Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1750743Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1751354Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1751831Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1752299Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1752721Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1753212Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1753666Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1754157Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1754610Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1755089Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1755560Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1756014Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1756420Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1756828Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1757233Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1757727Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1758265Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1758749Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1759150Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1759520Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1759927Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1760298Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1760699Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1761145Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1761545Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1761966Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1762503Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1763029Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1763567Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1763967Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1764340Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1764815Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1765184Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1765666Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1766120Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1766561Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1767152Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1767746Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1768053Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1769915Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1770370Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1771254Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1771797Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1772550Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1773129Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1773873Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1774565Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1775118Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1776095Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1776402Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1777165Z E1204 10:23:08.025000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1777276Z ('RERUN', {'yellow': True}) [1.7434s] [100%] 2025-12-04T10:35:20.1778500Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1779468Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1779822Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1780188Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1780625Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1781013Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1781563Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1782015Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1782507Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1783003Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1783473Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1783848Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1784292Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1784689Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1785075Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1785444Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1786082Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1786566Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1787029Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1787460Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1787951Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1788400Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1788891Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1789346Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1789827Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1790274Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1790701Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1791104Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1791507Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1791908Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1792399Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1792966Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1793449Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1793849Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1794214Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1794621Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1794991Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1795394Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1795842Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1796246Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1796667Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1797209Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1797733Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1798273Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1798674Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1799044Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1799526Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1799892Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1800385Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1800830Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1801268Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1801858Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1802453Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1802758Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1804625Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1805081Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1806015Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1806551Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1807305Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1808028Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1808778Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1809542Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1810150Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1811149Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1811477Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1812291Z E1204 10:23:08.365000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1812409Z ('RERUN', {'yellow': True}) [0.3082s] [100%] 2025-12-04T10:35:20.1813722Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.1814712Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1815102Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1815500Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.1816011Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.1816425Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1817015Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1817552Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1818186Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1818833Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1819349Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1819722Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1820163Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1820557Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1820940Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1821309Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.1821923Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1822408Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.1822872Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.1823299Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1823787Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1824237Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.1824723Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1825175Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.1825668Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1826114Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.1826547Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.1826950Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.1827348Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.1827750Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.1828321Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1829057Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.1829671Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1830196Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.1830668Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.1831207Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.1831748Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.1832310Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.1832963Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.1833552Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.1834179Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.1835173Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1836071Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.1836882Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1837451Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.1838014Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.1838757Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.1839317Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.1839993Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.1840653Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.1841317Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.1842224Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.1843100Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.1843561Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1846435Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1847165Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1848477Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1849322Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1850501Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1851384Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1852495Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1853640Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1854540Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1855952Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1856450Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1857553Z E1204 10:23:08.675000 82559 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1857698Z FAILED [0.3079s] [100%] 2025-12-04T10:35:20.1857722Z 2025-12-04T10:35:20.1857893Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.1858394Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1858543Z Traceback (most recent call last): 2025-12-04T10:35:20.1859099Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1859445Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1860072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1860357Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1860917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1861148Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1861737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1861920Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1862684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1863081Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1863763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1863946Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1864554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1864712Z return self._compile_to_module() 2025-12-04T10:35:20.1865343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1865618Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1866235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1866415Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1867002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1867305Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1868044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1868310Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1868892Z File "/tmp/tmpscihzwt2/5l/c5lixojicqkkihemc4dhkmp3kh4lt5ommxwfumeppk7vvctzoxen.py", line 74, in 2025-12-04T10:35:20.1869518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1869658Z kernel.precompile( 2025-12-04T10:35:20.1870345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1870495Z self._precompile_worker() 2025-12-04T10:35:20.1871283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1871518Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1872251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1872507Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1873034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1873344Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1873892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1874328Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1874610Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1875489Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1875611Z ^ 2025-12-04T10:35:20.1876207Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1876216Z 2025-12-04T10:35:20.1877122Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1877138Z 2025-12-04T10:35:20.1877144Z 2025-12-04T10:35:20.1877432Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1878726Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1878738Z 2025-12-04T10:35:20.1879105Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1879374Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1879558Z frames [('total', 1)] 2025-12-04T10:35:20.1879720Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1880311Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1880621Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1880756Z graph_break [] 2025-12-04T10:35:20.1881256Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1881427Z Traceback (most recent call last): 2025-12-04T10:35:20.1881988Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1882289Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1882925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1883257Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1884059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1884318Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1885038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1885252Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1885946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1886359Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1887043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1887246Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1887885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1888052Z return self._compile_to_module() 2025-12-04T10:35:20.1888700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1888920Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1889588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1889786Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1890441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1890749Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1891533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1891707Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1892352Z File "/tmp/tmp4_1w8kz3/uc/cucaizpc6iis4deacgalizwass7uald6fjcjnd673j3ncjzfdlxf.py", line 74, in 2025-12-04T10:35:20.1892930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1893074Z kernel.precompile( 2025-12-04T10:35:20.1893908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1894054Z self._precompile_worker() 2025-12-04T10:35:20.1894676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1894834Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1895341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1895544Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1895950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1896158Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1896539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1896825Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1897033Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1897591Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1897663Z ^ 2025-12-04T10:35:20.1898136Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1898142Z 2025-12-04T10:35:20.1898749Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1898824Z 2025-12-04T10:35:20.1898828Z 2025-12-04T10:35:20.1899019Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1899866Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1899872Z 2025-12-04T10:35:20.1900640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1900885Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1901015Z frames [('total', 1)] 2025-12-04T10:35:20.1901173Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1901686Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1901946Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1902075Z graph_break [] 2025-12-04T10:35:20.1902319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1902443Z frames [('total', 1)] 2025-12-04T10:35:20.1902588Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1902825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1903396Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1903533Z graph_break [] 2025-12-04T10:35:20.1903713Z =================================== FAILURES =================================== 2025-12-04T10:35:20.1904179Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T10:35:20.1904332Z Traceback (most recent call last): 2025-12-04T10:35:20.1904802Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.1905095Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.1905654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.1906065Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.1906528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.1906698Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.1907150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.1907292Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.1908022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.1908332Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.1908788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.1915252Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.1915893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.1916047Z return self._compile_to_module() 2025-12-04T10:35:20.1916970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.1917174Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.1918088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.1918282Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.1919004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.1919294Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.1919958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.1920081Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.1920529Z File "/tmp/tmpx3a3cv_r/wg/cwgbdfiglz22weuk7ohiniahjmveykawtvighr3gy7gyp446qejl.py", line 74, in 2025-12-04T10:35:20.1920939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.1921052Z kernel.precompile( 2025-12-04T10:35:20.1921537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.1921646Z self._precompile_worker() 2025-12-04T10:35:20.1922178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.1922339Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.1922873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1923049Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1923442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1923667Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1924059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1924364Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1924571Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1925139Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1925472Z ^ 2025-12-04T10:35:20.1925907Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1925913Z 2025-12-04T10:35:20.1926542Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1926547Z 2025-12-04T10:35:20.1926554Z 2025-12-04T10:35:20.1926747Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1927505Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1927522Z 2025-12-04T10:35:20.1927758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1927955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1928062Z frames [('total', 1)] 2025-12-04T10:35:20.1928167Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1928581Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1928787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1928876Z graph_break [] 2025-12-04T10:35:20.1929117Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1929208Z frames [('total', 1)] 2025-12-04T10:35:20.1929311Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1929514Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1929961Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1930048Z graph_break [] 2025-12-04T10:35:20.1930249Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.1930340Z frames [('total', 1)] 2025-12-04T10:35:20.1930444Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.1930643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.1931046Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.1931138Z graph_break [] 2025-12-04T10:35:20.1931711Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.xml - 2025-12-04T10:35:20.1931863Z =========================== short test summary info ============================ 2025-12-04T10:35:20.1932613Z FAILED [0.3079s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.1933181Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1933265Z ^ 2025-12-04T10:35:20.1933664Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1933669Z 2025-12-04T10:35:20.1934381Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.1934399Z 2025-12-04T10:35:20.1934403Z 2025-12-04T10:35:20.1934595Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.1935347Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1935352Z 2025-12-04T10:35:20.1935725Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.1935895Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.1936085Z ================== 1 failed, 187 deselected, 2 rerun in 2.39s ================== 2025-12-04T10:35:20.1936168Z Got exit code 1 2025-12-04T10:35:20.1936731Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T10:35:20.1937115Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.1937543Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.xml 2025-12-04T10:35:20.1937693Z ============================= test session starts ============================== 2025-12-04T10:35:20.1938018Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.1938113Z cachedir: .pytest_cache 2025-12-04T10:35:20.1938601Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.1938714Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.1938811Z configfile: pytest.ini 2025-12-04T10:35:20.1939385Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.1939640Z collecting ... collected 188 items / 28 deselected / 160 selected 2025-12-04T10:35:20.1939768Z stepcurrent: skipping 28 already run items. 2025-12-04T10:35:20.1939878Z Running 160 items in this shard 2025-12-04T10:35:20.1939922Z 2025-12-04T10:35:20.1941176Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.1942180Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1942552Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1942946Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.1943397Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.1943802Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1944286Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1944755Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1945264Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1945778Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1946267Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1946651Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1947185Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1947601Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1948006Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1948398Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.1948823Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.1949384Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1950002Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.1950596Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.1951065Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.1951615Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.1952059Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1952510Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.1952890Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.1953307Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.1953689Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.1954089Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.1954552Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.1954959Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.1955415Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.1955929Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1956435Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.1957000Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.1957418Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.1957810Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.1958306Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.1958765Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.1959273Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.1959733Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.1960188Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.1960794Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.1961413Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.1961735Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.1963801Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.1964346Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.1965259Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.1965858Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.1966627Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.1967226Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.1968343Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.1969330Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.1970088Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.1971525Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1971957Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.1973329Z E1204 10:23:18.945000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.1973511Z ('RERUN', {'yellow': True}) [1.9610s] [ 0%] 2025-12-04T10:35:20.1975170Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.1976497Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.1977003Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.1977585Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.1978198Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.1978781Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.1979698Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.1980473Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.1981225Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.1981987Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.1982714Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.1983287Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.1983968Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.1984972Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.1985886Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.1986486Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.1987118Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.1987942Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.1988854Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.1989730Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.1990436Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.1991297Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.1991975Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.1992560Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.1993116Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.1993720Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.1994291Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.1994883Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.1995590Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.1996227Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.1996877Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.1997746Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.1998569Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.1999398Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.2000017Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.2000599Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.2001322Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.2001891Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.2002607Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.2003312Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.2003979Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.2004895Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.2005881Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.2006350Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2009800Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2010530Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2011854Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2012674Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2013825Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2014710Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2016013Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2017121Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2017920Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2019485Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2019952Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2021098Z E1204 10:23:19.469000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2021289Z ('RERUN', {'yellow': True}) [0.4914s] [ 0%] 2025-12-04T10:35:20.2023123Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.2024610Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2025180Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.2025762Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.2026411Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.2027229Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2027946Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2028622Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2029391Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2030138Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.2030846Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2031430Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.2032089Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2032717Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2033297Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2033957Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.2034563Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.2035469Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.2036414Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2037314Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2038006Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.2038691Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.2039331Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2039953Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.2040504Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.2041125Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.2041670Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.2042271Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.2042934Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.2043540Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.2044344Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.2045110Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2045828Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.2046670Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.2047296Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.2047873Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.2048631Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.2049180Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.2049935Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.2050621Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.2051378Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.2052334Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.2053269Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.2053750Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2056807Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2057532Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2058858Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2059786Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2060928Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2061805Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2063073Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2064071Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2064851Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2066325Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2066816Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2067953Z E1204 10:23:19.965000 82740 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2068102Z FAILED [0.4941s] [ 0%] 2025-12-04T10:35:20.2068111Z 2025-12-04T10:35:20.2068293Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.2068885Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2069052Z Traceback (most recent call last): 2025-12-04T10:35:20.2069673Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2069991Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2070628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2070942Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2071622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2071869Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2075631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2075920Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2076607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2077036Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2077706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2077909Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2078548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2078704Z return self._compile_to_module() 2025-12-04T10:35:20.2079325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2079571Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2080225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2080416Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2081309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2081624Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2082494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2082669Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2083331Z File "/tmp/tmp_lq9uezc/nk/cnkstcietkbkskwkvzuxgmyote4ffwvprahqkurchqshgwaa7ztm.py", line 137, in 2025-12-04T10:35:20.2083914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2084071Z kernel.precompile( 2025-12-04T10:35:20.2084814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2084962Z self._precompile_worker() 2025-12-04T10:35:20.2085740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2085974Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2086727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2086996Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2087535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2087961Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2088530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2088974Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2089356Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2090264Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2090381Z ^ 2025-12-04T10:35:20.2090976Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2090986Z 2025-12-04T10:35:20.2091884Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2091899Z 2025-12-04T10:35:20.2092037Z 2025-12-04T10:35:20.2092332Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2093450Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2093466Z 2025-12-04T10:35:20.2093804Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2094111Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2094243Z frames [('total', 1)] 2025-12-04T10:35:20.2094399Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2094993Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2095295Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2095433Z graph_break [] 2025-12-04T10:35:20.2095939Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2096101Z Traceback (most recent call last): 2025-12-04T10:35:20.2096654Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2096942Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2097687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2098011Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2098669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2098918Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2099675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2099877Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2100550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2100985Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2101657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2101844Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2102468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2102631Z return self._compile_to_module() 2025-12-04T10:35:20.2103277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2103595Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2104280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2104518Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2105159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2105470Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2106259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2106425Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2107070Z File "/tmp/tmpkl4yim31/lg/clgiw6lh6c2gmqnklcjejrrrlzrz7tvt2kmedr33sxktzyowcohg.py", line 137, in 2025-12-04T10:35:20.2107986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2108146Z kernel.precompile( 2025-12-04T10:35:20.2108867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2109026Z self._precompile_worker() 2025-12-04T10:35:20.2109786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2110032Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2110784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2111050Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2111636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2111966Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2112550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2112972Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2113274Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2114325Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2114454Z ^ 2025-12-04T10:35:20.2115059Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2115070Z 2025-12-04T10:35:20.2116007Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2116027Z 2025-12-04T10:35:20.2116034Z 2025-12-04T10:35:20.2116318Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2117433Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2117448Z 2025-12-04T10:35:20.2117805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2118101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2118224Z frames [('total', 1)] 2025-12-04T10:35:20.2118378Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2118989Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2119274Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2119541Z graph_break [] 2025-12-04T10:35:20.2119818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2119953Z frames [('total', 1)] 2025-12-04T10:35:20.2120212Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2120499Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2121084Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2121215Z graph_break [] 2025-12-04T10:35:20.2121406Z =================================== FAILURES =================================== 2025-12-04T10:35:20.2121913Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2122083Z Traceback (most recent call last): 2025-12-04T10:35:20.2122620Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2123062Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2123719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2124047Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2124698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2124949Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2125610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2125807Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2126500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2126922Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2127598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2127811Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2128417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2128571Z return self._compile_to_module() 2025-12-04T10:35:20.2129284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2129504Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2130166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2130346Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2130966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2131288Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2132056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2132231Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2132866Z File "/tmp/tmpdds3g8_9/vn/cvn2vo7n7mxdtr6e5zhza3xkubbm6tuglkrgpdglrugi5n7ay5il.py", line 137, in 2025-12-04T10:35:20.2133460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2133622Z kernel.precompile( 2025-12-04T10:35:20.2134336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2134493Z self._precompile_worker() 2025-12-04T10:35:20.2135382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2135618Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2136498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2136744Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2137288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2137584Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2138104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2138548Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2138844Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2139884Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2140056Z ^ 2025-12-04T10:35:20.2140632Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2140642Z 2025-12-04T10:35:20.2141565Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2141575Z 2025-12-04T10:35:20.2141582Z 2025-12-04T10:35:20.2141875Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2142969Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2142995Z 2025-12-04T10:35:20.2143351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2143633Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2143783Z frames [('total', 1)] 2025-12-04T10:35:20.2143937Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2144545Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2144940Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2145082Z graph_break [] 2025-12-04T10:35:20.2145368Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2145511Z frames [('total', 1)] 2025-12-04T10:35:20.2145657Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2145946Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2146569Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2146696Z graph_break [] 2025-12-04T10:35:20.2146989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2147123Z frames [('total', 1)] 2025-12-04T10:35:20.2147276Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2147567Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2148173Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2148309Z graph_break [] 2025-12-04T10:35:20.2149139Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.xml - 2025-12-04T10:35:20.2149374Z =========================== short test summary info ============================ 2025-12-04T10:35:20.2150554Z FAILED [0.4941s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2151557Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2151679Z ^ 2025-12-04T10:35:20.2152280Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2152290Z 2025-12-04T10:35:20.2153207Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2153217Z 2025-12-04T10:35:20.2153232Z 2025-12-04T10:35:20.2153519Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2154744Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2154756Z 2025-12-04T10:35:20.2155122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2155350Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.2155617Z ================== 1 failed, 28 deselected, 2 rerun in 2.98s =================== 2025-12-04T10:35:20.2155745Z Got exit code 1 2025-12-04T10:35:20.2155878Z Retrying single test... 2025-12-04T10:35:20.2156484Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.xml 2025-12-04T10:35:20.2156702Z ============================= test session starts ============================== 2025-12-04T10:35:20.2157149Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.2157309Z cachedir: .pytest_cache 2025-12-04T10:35:20.2157981Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.2158157Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.2158301Z configfile: pytest.ini 2025-12-04T10:35:20.2159000Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.2159398Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.2160427Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2160580Z Running 1 items in this shard 2025-12-04T10:35:20.2160588Z 2025-12-04T10:35:20.2162450Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.2163949Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2164525Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.2165113Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.2165832Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.2166546Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2167234Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2168013Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2168788Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2169557Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.2170278Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2171043Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.2171730Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2172337Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2172942Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2173513Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.2174158Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.2174981Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.2175928Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2176965Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2177758Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.2178479Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.2179230Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2179842Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.2180424Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.2181039Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.2181614Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.2182217Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.2182896Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.2183518Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.2184278Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.2185058Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2185902Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.2186745Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.2187374Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.2187960Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.2188824Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.2189396Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.2190151Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.2190854Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.2191514Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.2192427Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.2193362Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.2193835Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2197047Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2197777Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2199131Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2199961Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2201119Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2202008Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2203293Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2204389Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2205192Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2206738Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2207332Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2208745Z E1204 10:23:29.614000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2208935Z ('RERUN', {'yellow': True}) [1.9510s] [100%] 2025-12-04T10:35:20.2210803Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.2212305Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2212877Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.2213464Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.2214136Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.2214879Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2215564Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2216268Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2217036Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2217789Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.2218517Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2219198Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.2219869Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2220482Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2221232Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2221802Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.2222537Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.2223369Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.2224266Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2225156Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2225967Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.2226702Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.2227374Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2227986Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.2228557Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.2229169Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.2229758Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.2230361Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.2231026Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.2231664Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.2232407Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.2233174Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2233926Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.2234766Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.2235403Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.2236036Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.2236764Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.2237334Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.2238079Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.2238862Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.2239529Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.2240539Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.2241447Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.2241923Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2245093Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2245871Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2247226Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2248056Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2249206Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2250203Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2251355Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2252497Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2253308Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2254797Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2255291Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2256515Z E1204 10:23:30.142000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2256700Z ('RERUN', {'yellow': True}) [0.4947s] [100%] 2025-12-04T10:35:20.2258661Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.2260375Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2260951Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.2261532Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.2262312Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.2262920Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2263640Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2264334Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2265086Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2265909Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.2266640Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2267215Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.2267893Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2268518Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2269243Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2269818Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.2270461Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.2271298Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.2272177Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2273080Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2273784Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.2274487Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.2275162Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2275867Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.2276442Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.2277147Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.2277717Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.2278329Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.2279003Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.2279720Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.2280388Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.2281154Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2281908Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.2282754Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.2283394Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.2283963Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.2284734Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.2285300Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.2286106Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.2286881Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.2287546Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.2288474Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.2289401Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.2289891Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2292864Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2293664Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2295060Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2295919Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2297060Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2298008Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2299225Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2300219Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2301001Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2302713Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2303192Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2304335Z E1204 10:23:30.637000 82964 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2304469Z FAILED [0.4935s] [100%] 2025-12-04T10:35:20.2304478Z 2025-12-04T10:35:20.2304726Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.2305238Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2305400Z Traceback (most recent call last): 2025-12-04T10:35:20.2305935Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2306241Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2306872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2307189Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2307999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2308246Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2308896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2309087Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2309762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2310176Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2310954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2311139Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2311839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2311991Z return self._compile_to_module() 2025-12-04T10:35:20.2312614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2312832Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2313490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2313663Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2314382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2314683Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2324936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2325136Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2325809Z File "/tmp/tmpwgcf43a9/bq/cbqrphnunnymv467uo6as7dukw46a3k6d5bvglvs5jhb6ylfyciy.py", line 137, in 2025-12-04T10:35:20.2326429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2326570Z kernel.precompile( 2025-12-04T10:35:20.2327293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2327440Z self._precompile_worker() 2025-12-04T10:35:20.2328223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2328450Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2329217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2329480Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2330178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2330501Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2331064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2331497Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2331791Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2332707Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2332821Z ^ 2025-12-04T10:35:20.2333418Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2333427Z 2025-12-04T10:35:20.2334344Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2334352Z 2025-12-04T10:35:20.2334359Z 2025-12-04T10:35:20.2334640Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2335804Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2335874Z 2025-12-04T10:35:20.2336226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2336508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2336690Z frames [('total', 1)] 2025-12-04T10:35:20.2336837Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2337395Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2337697Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2337819Z graph_break [] 2025-12-04T10:35:20.2338307Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2338466Z Traceback (most recent call last): 2025-12-04T10:35:20.2338983Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2339484Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2340067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2340399Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2341123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2341377Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2342083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2342290Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2342998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2343447Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2344094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2344270Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2344823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2344971Z return self._compile_to_module() 2025-12-04T10:35:20.2345617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2345848Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2346438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2346601Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2347184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2347476Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2348141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2348293Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2348834Z File "/tmp/tmpx34jld5q/r5/cr5uyoyiv73za5p65b7bnhr7jcdy67h45xixxulikvhyeqjcd7wh.py", line 137, in 2025-12-04T10:35:20.2349324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2349439Z kernel.precompile( 2025-12-04T10:35:20.2350024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2350126Z self._precompile_worker() 2025-12-04T10:35:20.2350717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2350873Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2351423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2351596Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2351974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2352177Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2352553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2352837Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2353040Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2353698Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2353774Z ^ 2025-12-04T10:35:20.2354184Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2354189Z 2025-12-04T10:35:20.2354797Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2354802Z 2025-12-04T10:35:20.2354806Z 2025-12-04T10:35:20.2354996Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2355999Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2356011Z 2025-12-04T10:35:20.2356248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2356539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2356632Z frames [('total', 1)] 2025-12-04T10:35:20.2356734Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2357134Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2357378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2357472Z graph_break [] 2025-12-04T10:35:20.2357650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2357736Z frames [('total', 1)] 2025-12-04T10:35:20.2357839Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2358018Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2358431Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2358513Z graph_break [] 2025-12-04T10:35:20.2358633Z =================================== FAILURES =================================== 2025-12-04T10:35:20.2358979Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2359084Z Traceback (most recent call last): 2025-12-04T10:35:20.2359449Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2359656Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2360072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2360292Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2360774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2360937Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2361378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2361541Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2362007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2362278Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2362718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2362851Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2363322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2363426Z return self._compile_to_module() 2025-12-04T10:35:20.2363843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2363984Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2364433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2364545Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2364966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2365168Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2365722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2365839Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2366268Z File "/tmp/tmp_achl4kh/lh/clhzoaw7k5fjx7ijd5ieu5lsgnbesjgeljltesnnsgeuntuij5jc.py", line 137, in 2025-12-04T10:35:20.2366662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2366769Z kernel.precompile( 2025-12-04T10:35:20.2367241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2367386Z self._precompile_worker() 2025-12-04T10:35:20.2367900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2368053Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2368570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2368741Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2369121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2369338Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2369711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2370005Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2370200Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2370806Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2370887Z ^ 2025-12-04T10:35:20.2371280Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2371332Z 2025-12-04T10:35:20.2371953Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2371996Z 2025-12-04T10:35:20.2372000Z 2025-12-04T10:35:20.2372185Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2372934Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2372945Z 2025-12-04T10:35:20.2373172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2373353Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2373446Z frames [('total', 1)] 2025-12-04T10:35:20.2373543Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2373980Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2374180Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2374265Z graph_break [] 2025-12-04T10:35:20.2374459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2374544Z frames [('total', 1)] 2025-12-04T10:35:20.2374640Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2374833Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2375225Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2375302Z graph_break [] 2025-12-04T10:35:20.2375488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2375592Z frames [('total', 1)] 2025-12-04T10:35:20.2375699Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2375908Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2376297Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2376387Z graph_break [] 2025-12-04T10:35:20.2376940Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.xml - 2025-12-04T10:35:20.2377131Z =========================== short test summary info ============================ 2025-12-04T10:35:20.2377861Z FAILED [0.4935s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2378461Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2378540Z ^ 2025-12-04T10:35:20.2378929Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2378936Z 2025-12-04T10:35:20.2379662Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2379673Z 2025-12-04T10:35:20.2379677Z 2025-12-04T10:35:20.2379861Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2380602Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2380607Z 2025-12-04T10:35:20.2380837Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2381041Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.2381214Z ================== 1 failed, 187 deselected, 2 rerun in 2.97s ================== 2025-12-04T10:35:20.2381294Z Got exit code 1 2025-12-04T10:35:20.2381451Z Retrying single test... 2025-12-04T10:35:20.2381860Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.xml 2025-12-04T10:35:20.2381994Z ============================= test session starts ============================== 2025-12-04T10:35:20.2382292Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.2382389Z cachedir: .pytest_cache 2025-12-04T10:35:20.2382835Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.2382943Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.2383028Z configfile: pytest.ini 2025-12-04T10:35:20.2383535Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.2383731Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.2384411Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2384503Z Running 1 items in this shard 2025-12-04T10:35:20.2384517Z 2025-12-04T10:35:20.2385800Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.2386793Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2387164Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.2387544Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.2388038Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.2388432Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2388890Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2389359Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2389853Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2390363Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.2390841Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2391220Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.2391661Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2392057Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2392507Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2392882Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.2393335Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.2393884Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.2394471Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2395097Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2395554Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.2396033Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.2396469Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2396866Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.2397244Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.2397650Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.2398041Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.2398433Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.2398876Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.2399327Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.2399760Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.2400275Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2400775Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.2401325Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.2401738Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.2402115Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.2402616Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.2402995Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.2403493Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.2403993Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.2404479Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.2405086Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.2405712Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.2406061Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2408508Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2408994Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2409886Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2410437Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2411197Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2411850Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2412615Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2413280Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2413829Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2414820Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2415139Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2415959Z E1204 10:23:40.296000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2416137Z ('RERUN', {'yellow': True}) [1.9588s] [100%] 2025-12-04T10:35:20.2417385Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.2418512Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2418893Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.2419323Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.2419832Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.2420227Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2420693Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2421166Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2421668Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2422175Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.2422650Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2423033Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.2423491Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2423945Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2424352Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2424730Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.2425150Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.2425769Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.2426356Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2426957Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2427403Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.2427890Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.2428399Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2428807Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.2429224Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.2429635Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.2430034Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.2430428Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.2430880Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.2431327Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.2431756Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.2432278Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2432772Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.2433319Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.2433733Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.2434115Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.2434614Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.2434991Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.2435573Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.2436151Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.2436700Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.2437466Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.2438145Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.2438465Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2440516Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2441073Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2441975Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2442528Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2443286Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2443911Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2444683Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2445347Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2445877Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2446867Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2447528Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2448539Z E1204 10:23:40.824000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2448795Z ('RERUN', {'yellow': True}) [0.4960s] [100%] 2025-12-04T10:35:20.2450792Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.2452162Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2452676Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.2453188Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.2453784Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.2454329Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2454973Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2455798Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2456515Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2457241Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.2457847Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2458338Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.2458998Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2459735Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2460273Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2460795Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.2461371Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.2462173Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.2463050Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2463961Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.2464615Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.2465280Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.2466019Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2466625Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.2467191Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.2467813Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.2468376Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.2468964Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.2469653Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.2470237Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.2470906Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.2471696Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2472553Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.2473410Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.2474012Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.2474569Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.2475295Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.2475913Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.2476724Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.2477381Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.2478046Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.2478924Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.2479804Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.2480285Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2483466Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2484222Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2485595Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2486409Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2487529Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2488403Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2489486Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2490559Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2491332Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2492773Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2493264Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2494497Z E1204 10:23:41.318000 83188 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2494657Z FAILED [0.4923s] [100%] 2025-12-04T10:35:20.2494665Z 2025-12-04T10:35:20.2494854Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.2495374Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2495557Z Traceback (most recent call last): 2025-12-04T10:35:20.2496125Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2496458Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2497038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2497357Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2498029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2498288Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2498973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2499284Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2499971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2500505Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2501161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2501365Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2501978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2502153Z return self._compile_to_module() 2025-12-04T10:35:20.2502785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2503011Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2503674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2503866Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2504517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2504837Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2505655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2505935Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2506603Z File "/tmp/tmpdaksqrlq/ph/cphhvmbkzw5mj2i3mnvc2ta236jgrhd623fwou6xswtfm42c5snp.py", line 137, in 2025-12-04T10:35:20.2507180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2507448Z kernel.precompile( 2025-12-04T10:35:20.2508334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2508509Z self._precompile_worker() 2025-12-04T10:35:20.2509304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2509542Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2510292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2510713Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2511294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2511627Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2512205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2512635Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2512940Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2513842Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2513979Z ^ 2025-12-04T10:35:20.2514598Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2514617Z 2025-12-04T10:35:20.2515585Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2515610Z 2025-12-04T10:35:20.2515616Z 2025-12-04T10:35:20.2515920Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2517196Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2517208Z 2025-12-04T10:35:20.2517569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2517878Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2518027Z frames [('total', 1)] 2025-12-04T10:35:20.2518184Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2518818Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2519106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2519239Z graph_break [] 2025-12-04T10:35:20.2519761Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2519939Z Traceback (most recent call last): 2025-12-04T10:35:20.2520479Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2520783Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2521383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2521689Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2522414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2522651Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2523259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2523542Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2524199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2524641Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2525341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2525546Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2526237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2526476Z return self._compile_to_module() 2025-12-04T10:35:20.2527124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2527341Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2528024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2528205Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2528824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2529123Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2529895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2530069Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2530720Z File "/tmp/tmpxbbpouox/km/ckm4qsvmoobmv5g76ztlzd62cjsxv3yhtad6raop7gduyk5xhu6z.py", line 137, in 2025-12-04T10:35:20.2531286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2531424Z kernel.precompile( 2025-12-04T10:35:20.2532128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2532351Z self._precompile_worker() 2025-12-04T10:35:20.2533107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2533332Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2534075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2534346Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2534910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2535225Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2535805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2536301Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2536607Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2537515Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2537638Z ^ 2025-12-04T10:35:20.2538234Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2538343Z 2025-12-04T10:35:20.2539373Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2539462Z 2025-12-04T10:35:20.2539468Z 2025-12-04T10:35:20.2539757Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2540851Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2540863Z 2025-12-04T10:35:20.2541226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2541508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2541644Z frames [('total', 1)] 2025-12-04T10:35:20.2541809Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2542571Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2542879Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2543018Z graph_break [] 2025-12-04T10:35:20.2543301Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2543437Z frames [('total', 1)] 2025-12-04T10:35:20.2543593Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2543890Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2544506Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2544632Z graph_break [] 2025-12-04T10:35:20.2544834Z =================================== FAILURES =================================== 2025-12-04T10:35:20.2545331Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T10:35:20.2545488Z Traceback (most recent call last): 2025-12-04T10:35:20.2546040Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2546359Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2547021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2547449Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2548130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2548386Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2549092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2549283Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2550006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2550433Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2551086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2551282Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2551845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2552003Z return self._compile_to_module() 2025-12-04T10:35:20.2552585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2552804Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2553601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2553778Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2554490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2554809Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2555625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2555810Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2556492Z File "/tmp/tmp8dywiimk/pf/cpfhxlm2pnfka2dekuhp4h6as7l6mrjl3kzzij77swb5j3kxxjkx.py", line 137, in 2025-12-04T10:35:20.2557105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2557261Z kernel.precompile( 2025-12-04T10:35:20.2558064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2558248Z self._precompile_worker() 2025-12-04T10:35:20.2559024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2559267Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2560076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2560327Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2560891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2561223Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2561817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2562285Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2562586Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2563502Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2563631Z ^ 2025-12-04T10:35:20.2564338Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2564351Z 2025-12-04T10:35:20.2565299Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2565311Z 2025-12-04T10:35:20.2565324Z 2025-12-04T10:35:20.2565627Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2566801Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2566820Z 2025-12-04T10:35:20.2567186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2567481Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2567628Z frames [('total', 1)] 2025-12-04T10:35:20.2567775Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2568413Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2568705Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2568833Z graph_break [] 2025-12-04T10:35:20.2569236Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2569366Z frames [('total', 1)] 2025-12-04T10:35:20.2569511Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2569809Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2570492Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2570626Z graph_break [] 2025-12-04T10:35:20.2570932Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2571064Z frames [('total', 1)] 2025-12-04T10:35:20.2571225Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2571515Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2572101Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.2572237Z graph_break [] 2025-12-04T10:35:20.2573181Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.xml - 2025-12-04T10:35:20.2573419Z =========================== short test summary info ============================ 2025-12-04T10:35:20.2574523Z FAILED [0.4923s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2575422Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.2575562Z ^ 2025-12-04T10:35:20.2576164Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2576176Z 2025-12-04T10:35:20.2577041Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2577053Z 2025-12-04T10:35:20.2577058Z 2025-12-04T10:35:20.2577307Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2578158Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2578171Z 2025-12-04T10:35:20.2578487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2578651Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.2578832Z ================== 1 failed, 187 deselected, 2 rerun in 2.98s ================== 2025-12-04T10:35:20.2578916Z Got exit code 1 2025-12-04T10:35:20.2579565Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T10:35:20.2579929Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.2580329Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.xml 2025-12-04T10:35:20.2580477Z ============================= test session starts ============================== 2025-12-04T10:35:20.2580780Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.2580872Z cachedir: .pytest_cache 2025-12-04T10:35:20.2581328Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.2581432Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.2581520Z configfile: pytest.ini 2025-12-04T10:35:20.2581989Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.2582239Z collecting ... collected 188 items / 29 deselected / 159 selected 2025-12-04T10:35:20.2582362Z stepcurrent: skipping 29 already run items. 2025-12-04T10:35:20.2582455Z Running 159 items in this shard 2025-12-04T10:35:20.2582501Z 2025-12-04T10:35:20.2583788Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.2584726Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2585097Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.2585579Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.2585984Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2586444Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2586904Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2587396Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2587822Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.2588295Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2588691Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.2589058Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.2589608Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2590111Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2590625Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2591132Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2591584Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2592032Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2592462Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2592865Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2593260Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2593950Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2594447Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2595011Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2595624Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.2596143Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.2596478Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.2597078Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.2597602Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.2598170Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.2598771Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.2599174Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.2599583Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.2599979Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.2600521Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.2601007Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.2601470Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.2601964Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2602412Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2602870Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2603285Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2603686Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2604086Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2604775Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2605276Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.2605704Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.2606179Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.2606609Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.2606994Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.2607420Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.2608195Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.2608747Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.2609197Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.2609698Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2610193Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.2610690Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.2611116Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.2611509Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.2612001Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.2612395Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.2612944Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.2613414Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.2613945Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.2614436Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.2614909Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.2615209Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2617204Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2617718Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2618670Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2619247Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2620010Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2620631Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2621382Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2622044Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2622560Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2623507Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2623816Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2624591Z E1204 10:23:50.801000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2624703Z ('RERUN', {'yellow': True}) [1.7863s] [ 0%] 2025-12-04T10:35:20.2625952Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.2626887Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2627258Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.2627645Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.2628043Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2628511Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2628973Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2629471Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2629931Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.2630440Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2630827Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.2631191Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.2631702Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2632245Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2632762Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2633265Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2633721Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2634166Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2634592Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2634996Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2635407Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2636098Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2636591Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2637089Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2637698Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.2638216Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.2638551Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.2639109Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.2639631Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.2640200Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.2640799Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.2641365Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.2641808Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.2642205Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.2642748Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.2643193Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.2643697Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.2644203Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2644655Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2645115Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2645528Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2645980Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2646377Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2647068Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2647529Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.2647988Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.2648377Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.2648808Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.2649194Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.2649627Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.2650081Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.2650499Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.2650948Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.2651448Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2651944Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.2652484Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.2652906Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.2653336Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.2653827Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.2654231Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.2654722Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.2655235Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.2655817Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.2656312Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.2656798Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.2657101Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2659104Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2659614Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2660528Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2661062Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2661843Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2662425Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2663175Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2663843Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2664410Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2665364Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2665721Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2673582Z E1204 10:23:51.168000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2673719Z ('RERUN', {'yellow': True}) [0.3333s] [ 0%] 2025-12-04T10:35:20.2674960Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.2675955Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2676335Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.2676721Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.2677113Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2677578Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2678040Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2678536Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2679008Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.2679484Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2679871Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.2680234Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.2680746Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2681250Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2681760Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2682254Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2682706Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2683153Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2683623Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2684074Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2684475Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2685167Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2685669Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2686211Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2686823Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.2687343Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.2687685Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.2688240Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.2688759Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.2689339Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.2689945Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.2690351Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.2690824Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.2691221Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.2691758Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.2692208Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.2692670Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.2693163Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2693613Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2694070Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2694489Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2694938Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2695336Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2696070Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2696523Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.2696941Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.2697326Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.2697803Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.2698188Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.2698622Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.2699145Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.2699565Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.2700012Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.2700517Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2701010Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.2701506Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.2701978Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.2702367Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.2702855Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.2703246Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.2703734Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.2704196Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.2704727Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.2705217Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.2705687Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.2706037Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2708266Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2708816Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2709769Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2710310Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2711076Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2711656Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2712406Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2713070Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2713587Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2714582Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2714890Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2715709Z E1204 10:23:51.503000 83412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2715802Z FAILED [0.3334s] [ 0%] 2025-12-04T10:35:20.2715807Z 2025-12-04T10:35:20.2715934Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.2716289Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.2716396Z Traceback (most recent call last): 2025-12-04T10:35:20.2716768Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2716971Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2717383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2717603Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2718790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2718967Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2719407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2719574Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2720042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2720322Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2720769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2720906Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2721316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2721473Z return self._compile_to_module() 2025-12-04T10:35:20.2721893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2722035Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2722493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2722605Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2723037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2723232Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2723737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2723855Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2724272Z File "/tmp/tmp_87ew5dr/w7/cw74tqprbz5gx3g3n7v4osjyzut7qflyrn4kazjyhdhemaxm5adp.py", line 65, in 2025-12-04T10:35:20.2724671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2724780Z kernel.precompile( 2025-12-04T10:35:20.2725252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2725409Z self._precompile_worker() 2025-12-04T10:35:20.2725967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2726119Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2726635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2726810Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2727200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2727410Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2727785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2728081Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2728275Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2728833Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2728915Z ^ 2025-12-04T10:35:20.2729313Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2729363Z 2025-12-04T10:35:20.2729982Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2730029Z 2025-12-04T10:35:20.2730033Z 2025-12-04T10:35:20.2730221Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2730988Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2730993Z 2025-12-04T10:35:20.2731221Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2731405Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2731505Z frames [('total', 1)] 2025-12-04T10:35:20.2731602Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2732077Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2732268Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2732353Z graph_break [] 2025-12-04T10:35:20.2732704Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.2732811Z Traceback (most recent call last): 2025-12-04T10:35:20.2733174Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2733375Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2733788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2734012Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2734452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2734612Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2735054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2735180Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2735674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2735956Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2736398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2736525Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2736935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2737037Z return self._compile_to_module() 2025-12-04T10:35:20.2737453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2737590Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2738035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2738143Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2738563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2738765Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2739322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2739481Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2739894Z File "/tmp/tmp8ydh_584/42/c42lcyd6rv2t2ga7l6unyb64xenips7osbnksuoe7y54utn6lbit.py", line 65, in 2025-12-04T10:35:20.2740288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2740433Z kernel.precompile( 2025-12-04T10:35:20.2740906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2741003Z self._precompile_worker() 2025-12-04T10:35:20.2741516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2741665Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2742179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2742390Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2742772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2742988Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2743359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2743656Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2743848Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2744407Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2744493Z ^ 2025-12-04T10:35:20.2744884Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2744893Z 2025-12-04T10:35:20.2745509Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2745516Z 2025-12-04T10:35:20.2745520Z 2025-12-04T10:35:20.2745704Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2746496Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2746509Z 2025-12-04T10:35:20.2746739Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2746925Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2747017Z frames [('total', 1)] 2025-12-04T10:35:20.2747113Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2747521Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2747718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2747801Z graph_break [] 2025-12-04T10:35:20.2747983Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2748087Z frames [('total', 1)] 2025-12-04T10:35:20.2748189Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2748395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2748796Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2748884Z graph_break [] 2025-12-04T10:35:20.2749019Z =================================== FAILURES =================================== 2025-12-04T10:35:20.2749365Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.2749521Z Traceback (most recent call last): 2025-12-04T10:35:20.2749895Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2750134Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2750571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2750787Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2751234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2751416Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2751858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2751997Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2752494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2752768Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2753232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2753362Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2753776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2753893Z return self._compile_to_module() 2025-12-04T10:35:20.2754311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2754463Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2754912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2755021Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2755463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2755666Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2756215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2756326Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2756762Z File "/tmp/tmpq6fyvb2n/a5/ca52id2idhnuzt4bfc5ydz3tk3lpmdex4bzcrgy5qzrlundrd3qc.py", line 65, in 2025-12-04T10:35:20.2757203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2757339Z kernel.precompile( 2025-12-04T10:35:20.2757959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2758098Z self._precompile_worker() 2025-12-04T10:35:20.2758676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2758833Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2759340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2759505Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2759889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2760093Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2760537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2760825Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2761057Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2761617Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2761689Z ^ 2025-12-04T10:35:20.2762083Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2762094Z 2025-12-04T10:35:20.2762702Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2762707Z 2025-12-04T10:35:20.2762711Z 2025-12-04T10:35:20.2762896Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2763701Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2763709Z 2025-12-04T10:35:20.2763935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2764122Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2764209Z frames [('total', 1)] 2025-12-04T10:35:20.2764305Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2764718Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2764905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2764990Z graph_break [] 2025-12-04T10:35:20.2765167Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2765252Z frames [('total', 1)] 2025-12-04T10:35:20.2765357Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2765542Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2765937Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2766022Z graph_break [] 2025-12-04T10:35:20.2766197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2766326Z frames [('total', 1)] 2025-12-04T10:35:20.2766426Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2766608Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2767006Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2767092Z graph_break [] 2025-12-04T10:35:20.2767655Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.xml - 2025-12-04T10:35:20.2767801Z =========================== short test summary info ============================ 2025-12-04T10:35:20.2768530Z FAILED [0.3334s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2769090Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2769161Z ^ 2025-12-04T10:35:20.2769552Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2769557Z 2025-12-04T10:35:20.2770165Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2770214Z 2025-12-04T10:35:20.2770218Z 2025-12-04T10:35:20.2770398Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2771157Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2771227Z 2025-12-04T10:35:20.2771452Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2771613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.2771779Z ================== 1 failed, 29 deselected, 2 rerun in 2.49s =================== 2025-12-04T10:35:20.2771864Z Got exit code 1 2025-12-04T10:35:20.2771959Z Retrying single test... 2025-12-04T10:35:20.2772362Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.xml 2025-12-04T10:35:20.2772561Z ============================= test session starts ============================== 2025-12-04T10:35:20.2772863Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.2772957Z cachedir: .pytest_cache 2025-12-04T10:35:20.2773416Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.2773521Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.2773609Z configfile: pytest.ini 2025-12-04T10:35:20.2774075Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.2774265Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.2774950Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2775059Z Running 1 items in this shard 2025-12-04T10:35:20.2775064Z 2025-12-04T10:35:20.2776274Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.2777259Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2777629Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.2778020Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.2778412Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2778862Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2779426Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2779927Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2780356Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.2780826Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2781271Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.2781638Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.2782188Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2782695Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2783209Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2783707Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2784201Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2784653Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2785079Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2785486Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2785937Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2786624Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2787077Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2787584Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2788198Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.2788758Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.2789100Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.2789647Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.2790187Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.2790761Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.2791368Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.2791772Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.2792180Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.2792627Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.2793166Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.2793663Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.2794128Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.2794622Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2795068Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2795555Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2795974Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2796381Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2796787Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2797474Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2797935Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.2798359Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.2798746Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.2799178Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.2799562Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.2800033Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.2800487Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.2800905Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.2801356Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.2801859Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2802359Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.2802854Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.2803275Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.2803672Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.2804209Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.2804644Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.2805129Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.2805602Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.2806179Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.2806667Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.2807176Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.2807483Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2809765Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2810232Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2811124Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2811741Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2812500Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2813086Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2813833Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2814493Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2815017Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2816002Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2816404Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2817163Z E1204 10:24:01.497000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2817337Z ('RERUN', {'yellow': True}) [1.7968s] [100%] 2025-12-04T10:35:20.2818489Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.2819495Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2819925Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.2820308Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.2820695Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2821149Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2821619Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2822109Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2822540Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.2823012Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2823389Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.2823891Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.2824490Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2825151Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2825770Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2826263Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2826716Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2827167Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2827587Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2827996Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2828394Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2829146Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2829653Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2830156Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2830764Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.2831281Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.2831661Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.2832216Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.2832742Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.2833317Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.2833930Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.2834338Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.2834752Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.2835158Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.2835793Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.2836246Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.2836709Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.2837209Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2837663Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2838111Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2838649Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2839174Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2839583Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2840268Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2840787Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.2841249Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.2841637Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.2842064Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.2842446Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.2842868Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.2843365Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.2843782Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.2844228Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.2844732Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2845226Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.2845720Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.2846143Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.2846534Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.2847021Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.2847455Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.2847941Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.2848395Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.2848936Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.2849422Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.2849894Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.2850198Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2852129Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2852664Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2853560Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2854088Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2854885Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2855468Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2856271Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2856934Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2857449Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2858393Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2858699Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2859604Z E1204 10:24:01.878000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2859718Z ('RERUN', {'yellow': True}) [0.3349s] [100%] 2025-12-04T10:35:20.2860875Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.2861810Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2862178Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.2862565Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.2862951Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2863402Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2863866Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2864404Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2864872Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.2865345Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2865757Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.2866142Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.2866647Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2867193Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2867707Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2868212Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2868661Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2869107Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2869528Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2869937Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2870339Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2871064Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2871514Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2872014Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2872623Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.2873148Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.2873491Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.2874050Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.2874570Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.2875138Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.2875790Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.2876237Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.2876648Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.2877048Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.2877584Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.2878040Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.2878550Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.2879053Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2879502Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2879953Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2880381Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2880785Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2881194Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2881882Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2882385Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.2882806Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.2883197Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.2883629Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.2884021Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.2884447Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.2884904Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.2885323Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.2885824Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.2886323Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2886866Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.2887361Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.2887822Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.2888215Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.2888705Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.2889099Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.2889625Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.2890081Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.2890621Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.2891111Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.2891579Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.2891879Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.2893853Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.2894313Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.2895202Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2895764Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2896556Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2897130Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2897883Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2898547Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2899205Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.2900196Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2900504Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.2901269Z E1204 10:24:02.213000 83593 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2901362Z FAILED [0.3341s] [100%] 2025-12-04T10:35:20.2901367Z 2025-12-04T10:35:20.2901555Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.2901910Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.2902017Z Traceback (most recent call last): 2025-12-04T10:35:20.2902384Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2902585Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2903008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2903230Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2903670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2903832Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2904272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2904392Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2904856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2905128Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2905621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2905750Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2906161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2906269Z return self._compile_to_module() 2025-12-04T10:35:20.2906688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2906823Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2907270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2907378Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2908193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2908409Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2908906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2909015Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2909449Z File "/tmp/tmpiuodrvvc/ee/ceemmtj5ftz52oo4ru2oymqs5scxwwz63ctjvvrjazhx6mw3w7ol.py", line 65, in 2025-12-04T10:35:20.2909939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2910045Z kernel.precompile( 2025-12-04T10:35:20.2910574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2910672Z self._precompile_worker() 2025-12-04T10:35:20.2911189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2911338Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2911849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2912014Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2912448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2912661Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2913035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2913331Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2913522Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2914081Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2914160Z ^ 2025-12-04T10:35:20.2914551Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2914556Z 2025-12-04T10:35:20.2915205Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2915216Z 2025-12-04T10:35:20.2915221Z 2025-12-04T10:35:20.2915478Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2916474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2916488Z 2025-12-04T10:35:20.2916846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2917112Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2917242Z frames [('total', 1)] 2025-12-04T10:35:20.2917371Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2917891Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2918125Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2918210Z graph_break [] 2025-12-04T10:35:20.2918557Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.2918668Z Traceback (most recent call last): 2025-12-04T10:35:20.2919026Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2919228Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2919644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2919856Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2920294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2920519Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2920966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2921088Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2921585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2921866Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2922318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2922443Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2922851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2922948Z return self._compile_to_module() 2025-12-04T10:35:20.2923406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2923545Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2923984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2924096Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2924515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2924713Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2925208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2925310Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2925795Z File "/tmp/tmpeq2wwevp/qy/cqy3s47ftnsg44gliter2wak4p2qstrv2ijtjlg5mwyzsbmolner.py", line 65, in 2025-12-04T10:35:20.2926192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2926289Z kernel.precompile( 2025-12-04T10:35:20.2926760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2926853Z self._precompile_worker() 2025-12-04T10:35:20.2927411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2927559Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2928061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2928231Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2928614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2928826Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2929196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2929480Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2929675Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2930231Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2930304Z ^ 2025-12-04T10:35:20.2930692Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2930696Z 2025-12-04T10:35:20.2931303Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2931352Z 2025-12-04T10:35:20.2931363Z 2025-12-04T10:35:20.2931544Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2932335Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2932341Z 2025-12-04T10:35:20.2932579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2932758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2932848Z frames [('total', 1)] 2025-12-04T10:35:20.2932943Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2933342Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2933579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2933665Z graph_break [] 2025-12-04T10:35:20.2933880Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2934008Z frames [('total', 1)] 2025-12-04T10:35:20.2934136Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2934377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2934878Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2934957Z graph_break [] 2025-12-04T10:35:20.2935082Z =================================== FAILURES =================================== 2025-12-04T10:35:20.2935422Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.2935524Z Traceback (most recent call last): 2025-12-04T10:35:20.2935890Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.2936086Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.2936498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.2936719Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.2937216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.2937383Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.2937820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.2937938Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.2938396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.2938676Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.2939205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.2939329Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.2939735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.2939842Z return self._compile_to_module() 2025-12-04T10:35:20.2940252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.2940390Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.2940827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.2940994Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.2941451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.2941657Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.2942222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.2942331Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.2942766Z File "/tmp/tmpf3miv1l8/xt/cxtrioydmgzln76ly23hxyv3bhaf4bk6byzhnyymkpxm5wwv4owv.py", line 65, in 2025-12-04T10:35:20.2943169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.2943259Z kernel.precompile( 2025-12-04T10:35:20.2943727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.2943873Z self._precompile_worker() 2025-12-04T10:35:20.2944380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.2944548Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.2945105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.2945290Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.2945720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.2945932Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.2946304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.2946598Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.2946793Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2947353Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2947430Z ^ 2025-12-04T10:35:20.2947823Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2947828Z 2025-12-04T10:35:20.2948498Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2948503Z 2025-12-04T10:35:20.2948507Z 2025-12-04T10:35:20.2948689Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2949446Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2949454Z 2025-12-04T10:35:20.2949678Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2949863Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2949946Z frames [('total', 1)] 2025-12-04T10:35:20.2950039Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2950445Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2950630Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2950708Z graph_break [] 2025-12-04T10:35:20.2950890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2950970Z frames [('total', 1)] 2025-12-04T10:35:20.2951069Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2951301Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2951698Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2951824Z graph_break [] 2025-12-04T10:35:20.2952000Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.2952082Z frames [('total', 1)] 2025-12-04T10:35:20.2952184Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.2952367Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.2952758Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.2952846Z graph_break [] 2025-12-04T10:35:20.2953405Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.xml - 2025-12-04T10:35:20.2953624Z =========================== short test summary info ============================ 2025-12-04T10:35:20.2954404Z FAILED [0.3341s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.2955001Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2955087Z ^ 2025-12-04T10:35:20.2955527Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.2955533Z 2025-12-04T10:35:20.2956217Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.2956222Z 2025-12-04T10:35:20.2956225Z 2025-12-04T10:35:20.2956420Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.2957227Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2957234Z 2025-12-04T10:35:20.2957471Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.2957632Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.2957864Z ================== 1 failed, 187 deselected, 2 rerun in 2.50s ================== 2025-12-04T10:35:20.2957951Z Got exit code 1 2025-12-04T10:35:20.2958162Z Retrying single test... 2025-12-04T10:35:20.2958567Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.xml 2025-12-04T10:35:20.2958702Z ============================= test session starts ============================== 2025-12-04T10:35:20.2959007Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.2959101Z cachedir: .pytest_cache 2025-12-04T10:35:20.2959548Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.2959660Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.2959750Z configfile: pytest.ini 2025-12-04T10:35:20.2960217Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.2960403Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.2961081Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.2961179Z Running 1 items in this shard 2025-12-04T10:35:20.2961183Z 2025-12-04T10:35:20.2962389Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.2963372Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.2963745Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.2964132Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.2964518Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.2965024Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.2965500Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.2966040Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.2966471Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.2966941Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.2967319Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.2973136Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.2973675Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2974188Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2974776Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.2975270Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2975733Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2976188Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2976618Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2977024Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2977417Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2978115Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2978560Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.2979219Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2979835Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.2980399Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.2980737Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.2981287Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.2981854Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.2982430Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.2983035Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.2983445Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.2983844Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.2984251Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.2984791Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.2985248Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.2985715Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.2986262Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.2986713Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.2987157Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.2987586Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.2987991Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.2988396Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.2989090Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.2989541Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.2989974Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.2990435Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.2990866Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.2991297Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.2991717Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.2992180Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.2992596Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.2993042Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.2993585Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.2994084Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.2994587Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.2995046Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.2995570Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.2996249Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.2996667Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.2997154Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.2997606Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.2998212Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.2998701Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.2999183Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.2999487Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3001425Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3001881Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3002813Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3003390Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3004148Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3004731Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3005610Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3006272Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3006791Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3008012Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3008332Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3009099Z E1204 10:24:12.245000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3009214Z ('RERUN', {'yellow': True}) [1.7780s] [100%] 2025-12-04T10:35:20.3010464Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.3011399Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3011761Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.3012147Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.3012532Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3012983Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3013451Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3013940Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3014364Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.3014906Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3015284Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3015758Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.3016272Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3016773Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3017282Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3017828Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3018284Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3018732Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3019212Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3019620Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3020013Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3020706Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3021154Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3021663Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3022320Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.3022837Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.3023175Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.3023728Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.3024260Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.3024831Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.3025438Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.3025842Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.3026286Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.3026691Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.3027265Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3027727Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.3028195Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.3028701Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3029191Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3029641Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3030073Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3030486Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3030889Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3031582Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3032034Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.3032459Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.3032849Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.3033431Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.3033824Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.3034240Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.3034704Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.3035130Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.3035619Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.3036135Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3036632Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.3037141Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.3037557Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.3038000Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.3038486Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.3038922Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.3039408Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.3039869Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.3040403Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.3040933Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.3041414Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.3041713Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3043645Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3044100Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3045030Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3045581Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3046370Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3046956Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3047703Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3048362Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3048878Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3049871Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3050274Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3051090Z E1204 10:24:12.611000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3051205Z ('RERUN', {'yellow': True}) [0.3334s] [100%] 2025-12-04T10:35:20.3052360Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.3053337Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3053706Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.3054094Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.3054493Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3054950Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3055412Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3055904Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3056328Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.3056797Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3057220Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3057591Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.3058093Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3058595Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3059175Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3059671Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3060138Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3060634Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3061174Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3061606Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3062063Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3062790Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3063241Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3063749Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3064359Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.3064915Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.3065254Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.3065850Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.3066381Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.3066953Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.3067557Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.3067968Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.3068382Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.3068779Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.3069362Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3069816Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.3070279Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.3070778Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3071230Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3071678Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3072100Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3072506Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3072906Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3073640Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3074127Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.3074558Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.3074946Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.3075383Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.3075822Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.3076309Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.3076775Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.3077201Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.3077657Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.3078157Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3078653Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.3079154Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.3079574Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.3079975Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.3080506Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.3080901Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.3081392Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.3081850Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.3082394Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.3082889Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.3083373Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.3083676Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3085614Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3086149Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3087039Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3087608Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3088363Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3088944Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3089785Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3090645Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3091180Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3092127Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3092446Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3093278Z E1204 10:24:12.947000 83774 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3093379Z FAILED [0.3342s] [100%] 2025-12-04T10:35:20.3093384Z 2025-12-04T10:35:20.3093514Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.3093879Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.3093993Z Traceback (most recent call last): 2025-12-04T10:35:20.3094362Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3094582Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3095008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3095239Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3095688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3095858Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3096308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3096484Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3097008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3097344Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3097797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3097941Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3098359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3098466Z return self._compile_to_module() 2025-12-04T10:35:20.3098892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3099109Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3099618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3099737Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3100212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3100421Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3100937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3101052Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3101501Z File "/tmp/tmpdpvpnbt1/hi/chi37at57h7wyjtyeit4oefrahv6osfprn2coaj4v5l45t7tvucz.py", line 65, in 2025-12-04T10:35:20.3101910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3102019Z kernel.precompile( 2025-12-04T10:35:20.3102503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3102612Z self._precompile_worker() 2025-12-04T10:35:20.3103142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3103303Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3103874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3104057Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3104448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3104669Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3105057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3105349Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3105583Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3106177Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3106264Z ^ 2025-12-04T10:35:20.3106666Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3106671Z 2025-12-04T10:35:20.3107289Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3107356Z 2025-12-04T10:35:20.3107360Z 2025-12-04T10:35:20.3107553Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3108624Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.3108725Z 2025-12-04T10:35:20.3108976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3109175Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3109275Z frames [('total', 1)] 2025-12-04T10:35:20.3109379Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3109793Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3109994Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3110088Z graph_break [] 2025-12-04T10:35:20.3110500Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.3110620Z Traceback (most recent call last): 2025-12-04T10:35:20.3110988Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3111201Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3111624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3111845Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3112300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3112470Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3112921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3113056Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3113524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3113816Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3114266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3114457Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3114883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3114992Z return self._compile_to_module() 2025-12-04T10:35:20.3115426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3115598Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3116070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3116196Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3116630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3116839Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3117350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3117464Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3117922Z File "/tmp/tmp4mwz5oo3/ed/ceddn4j5nx7rvgwuipwrbnpefara2clksjxvaadsdvq7tmyue5xk.py", line 65, in 2025-12-04T10:35:20.3118323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3118526Z kernel.precompile( 2025-12-04T10:35:20.3119017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3119194Z self._precompile_worker() 2025-12-04T10:35:20.3119724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3119881Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3120399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3120594Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3120982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3121206Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3121633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3121928Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3122138Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3122705Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3122790Z ^ 2025-12-04T10:35:20.3123194Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3123199Z 2025-12-04T10:35:20.3123816Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3123823Z 2025-12-04T10:35:20.3123827Z 2025-12-04T10:35:20.3124026Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3124787Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.3124794Z 2025-12-04T10:35:20.3125036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3125273Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3125368Z frames [('total', 1)] 2025-12-04T10:35:20.3125501Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3125939Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3126139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3126230Z graph_break [] 2025-12-04T10:35:20.3126421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3126517Z frames [('total', 1)] 2025-12-04T10:35:20.3126622Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3126815Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3127227Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3127313Z graph_break [] 2025-12-04T10:35:20.3127451Z =================================== FAILURES =================================== 2025-12-04T10:35:20.3127804Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.3127913Z Traceback (most recent call last): 2025-12-04T10:35:20.3128286Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3128489Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3128959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3129184Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3129677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3129857Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3130302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3130430Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3130902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3131184Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3131692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3131863Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3132283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3132400Z return self._compile_to_module() 2025-12-04T10:35:20.3132823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3132966Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3133424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3133539Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3133973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3134188Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3134695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3134818Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3135272Z File "/tmp/tmpn3ouw87o/n5/cn5gxgv7yjyaqengtddidqhodyib6kib2wiqnryavjijb7mvuxdj.py", line 65, in 2025-12-04T10:35:20.3135724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3135826Z kernel.precompile( 2025-12-04T10:35:20.3136307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3136415Z self._precompile_worker() 2025-12-04T10:35:20.3136934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3137094Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3137615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3137791Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3138186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3138403Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3138786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3139177Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3139380Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3140005Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3140124Z ^ 2025-12-04T10:35:20.3140525Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3140530Z 2025-12-04T10:35:20.3141157Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3141162Z 2025-12-04T10:35:20.3141166Z 2025-12-04T10:35:20.3141358Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3142129Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.3142136Z 2025-12-04T10:35:20.3142432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3142624Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3142727Z frames [('total', 1)] 2025-12-04T10:35:20.3142829Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3143245Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3143446Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3143536Z graph_break [] 2025-12-04T10:35:20.3143727Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3143817Z frames [('total', 1)] 2025-12-04T10:35:20.3143919Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3144115Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3144522Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3144619Z graph_break [] 2025-12-04T10:35:20.3144806Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3144900Z frames [('total', 1)] 2025-12-04T10:35:20.3145006Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3145197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3145731Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3145825Z graph_break [] 2025-12-04T10:35:20.3146392Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.xml - 2025-12-04T10:35:20.3146551Z =========================== short test summary info ============================ 2025-12-04T10:35:20.3147293Z FAILED [0.3342s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3147857Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3147943Z ^ 2025-12-04T10:35:20.3148347Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3148351Z 2025-12-04T10:35:20.3148973Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3148978Z 2025-12-04T10:35:20.3148981Z 2025-12-04T10:35:20.3149171Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3149938Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.3149986Z 2025-12-04T10:35:20.3150224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3150426Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.3150615Z ================== 1 failed, 187 deselected, 2 rerun in 2.48s ================== 2025-12-04T10:35:20.3150705Z Got exit code 1 2025-12-04T10:35:20.3151257Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T10:35:20.3151625Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.3152036Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.xml 2025-12-04T10:35:20.3152233Z ============================= test session starts ============================== 2025-12-04T10:35:20.3152574Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.3152675Z cachedir: .pytest_cache 2025-12-04T10:35:20.3153140Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.3153249Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.3153351Z configfile: pytest.ini 2025-12-04T10:35:20.3153823Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.3154024Z collecting ... collected 188 items / 30 deselected / 158 selected 2025-12-04T10:35:20.3154155Z stepcurrent: skipping 30 already run items. 2025-12-04T10:35:20.3154256Z Running 158 items in this shard 2025-12-04T10:35:20.3154260Z 2025-12-04T10:35:20.3155523Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3156651Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3157024Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3157421Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3157817Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3158291Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3158792Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3159447Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3160103Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3160747Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3161279Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3162176Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3162838Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3163334Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3163897Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3164362Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3164820Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3165314Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3165783Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3166187Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3166629Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3167284Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3167891Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3168484Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3168950Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3169414Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3169805Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3170238Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3170627Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3171057Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3171521Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3171946Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3172408Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3172924Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3173432Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3174000Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3174430Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3174882Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3175384Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3175839Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3176336Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3176848Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3177499Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3178000Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3178454Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3179128Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3179461Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3181768Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3182243Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3183152Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3183706Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3184481Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3185067Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3185885Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3186596Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3187259Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3188337Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3188661Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3189475Z E1204 10:24:22.994000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3189593Z ('RERUN', {'yellow': True}) [1.8868s] [ 0%] 2025-12-04T10:35:20.3190847Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3191916Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3192296Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3192684Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3193086Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3193551Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3194060Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3194569Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3195075Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3195565Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3195953Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3196498Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3196963Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3197433Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3197939Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3198443Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3198945Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3199368Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3199785Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3200200Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3200704Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3201426Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3202094Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3202700Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3203162Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3203581Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3203980Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3204407Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3204801Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3205226Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3205772Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3206202Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3206653Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3207173Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3207683Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3208511Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3208962Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3209363Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3209869Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3210371Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3210873Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3211410Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3212029Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3212563Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3213122Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3213913Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3214239Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3216653Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3217129Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3218097Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3218648Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3219472Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3220072Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3220832Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3221506Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3222033Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3223109Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3223483Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3224294Z E1204 10:24:23.434000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3224420Z ('RERUN', {'yellow': True}) [0.4077s] [ 0%] 2025-12-04T10:35:20.3225706Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3226836Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3227210Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3227602Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3228008Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3228469Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3228946Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3229454Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3229968Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3230446Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3230885Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3231437Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3231890Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3232376Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3232876Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3233339Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3233805Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3234229Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3234658Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3235102Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3235561Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3236286Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3236882Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3237484Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3237983Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3238411Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3238806Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3239236Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3239637Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3240053Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3240521Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3240950Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3241400Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3241923Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3242469Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3242963Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3243391Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3243796Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3244300Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3244696Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3245202Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3245692Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3246323Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3246878Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3247367Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3247991Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3248304Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3250623Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3251102Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3252010Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3252558Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3253332Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3253921Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3254728Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3255397Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3255927Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3257006Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3257327Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3258107Z E1204 10:24:23.843000 83955 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3258199Z FAILED [0.4075s] [ 0%] 2025-12-04T10:35:20.3258204Z 2025-12-04T10:35:20.3258383Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.3258747Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3258861Z Traceback (most recent call last): 2025-12-04T10:35:20.3259326Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3259540Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3259970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3260208Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3260653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3260830Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3261320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3261456Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3261930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3262219Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3262682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3262823Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3263248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3263368Z return self._compile_to_module() 2025-12-04T10:35:20.3263795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3263953Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3264413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3264533Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3264973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3265221Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3265732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3265855Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3266292Z File "/tmp/tmprjg0k_q7/wd/cwdgp3iebwu6yvrowg3ani7upfl4zqiwupk36gvhxuvnslp34u2z.py", line 137, in 2025-12-04T10:35:20.3266703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3266812Z kernel.precompile( 2025-12-04T10:35:20.3267296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3267410Z self._precompile_worker() 2025-12-04T10:35:20.3267930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3268094Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3268630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3268811Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3269212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3269478Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3269864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3270207Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3270416Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3271127Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3271223Z ^ 2025-12-04T10:35:20.3271628Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3271633Z 2025-12-04T10:35:20.3272307Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3272315Z 2025-12-04T10:35:20.3272319Z 2025-12-04T10:35:20.3272510Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3273281Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3273286Z 2025-12-04T10:35:20.3273524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3273719Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3273817Z frames [('total', 1)] 2025-12-04T10:35:20.3273922Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3274345Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3274543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3274633Z graph_break [] 2025-12-04T10:35:20.3274991Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3275102Z Traceback (most recent call last): 2025-12-04T10:35:20.3275474Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3275716Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3276210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3276440Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3276889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3277059Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3277515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3277644Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3278112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3278400Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3278855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3278995Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3279414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3279525Z return self._compile_to_module() 2025-12-04T10:35:20.3279956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3280146Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3280602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3280757Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3281188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3281399Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3281907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3282033Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3282469Z File "/tmp/tmpf736ua_8/4v/c4vspfek4zdn65oysipklcf5zsstvgb4wxbqjpn3wg444jmx3kwc.py", line 137, in 2025-12-04T10:35:20.3282917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3283025Z kernel.precompile( 2025-12-04T10:35:20.3283507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3283613Z self._precompile_worker() 2025-12-04T10:35:20.3284145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3284303Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3284836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3285015Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3285405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3285639Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3286137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3286547Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3286817Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3287844Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3287987Z ^ 2025-12-04T10:35:20.3295067Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3295082Z 2025-12-04T10:35:20.3295921Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3295942Z 2025-12-04T10:35:20.3295947Z 2025-12-04T10:35:20.3296202Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3296973Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3296979Z 2025-12-04T10:35:20.3297233Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3297431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3297533Z frames [('total', 1)] 2025-12-04T10:35:20.3297644Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3298062Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3298349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3298435Z graph_break [] 2025-12-04T10:35:20.3298632Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3298801Z frames [('total', 1)] 2025-12-04T10:35:20.3298903Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3299167Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3299581Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3299668Z graph_break [] 2025-12-04T10:35:20.3299806Z =================================== FAILURES =================================== 2025-12-04T10:35:20.3300162Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3300277Z Traceback (most recent call last): 2025-12-04T10:35:20.3300709Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3300915Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3301361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3301586Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3302036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3302216Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3302661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3302796Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3303269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3303561Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3304023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3304157Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3304575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3304736Z return self._compile_to_module() 2025-12-04T10:35:20.3305160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3305313Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3305809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3305932Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3306371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3306575Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3307092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3307206Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3307658Z File "/tmp/tmptwh4ft5l/ws/cwstn6wee6ekvkbrcdhd63wn7ggrie4edg4ixsfv65nh2xrjaqb4.py", line 137, in 2025-12-04T10:35:20.3308337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3308467Z kernel.precompile( 2025-12-04T10:35:20.3309102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3309333Z self._precompile_worker() 2025-12-04T10:35:20.3309854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3310075Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3310593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3310767Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3311171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3311385Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3311779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3312131Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3312339Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3313054Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3313134Z ^ 2025-12-04T10:35:20.3313536Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3313550Z 2025-12-04T10:35:20.3314173Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3314178Z 2025-12-04T10:35:20.3314182Z 2025-12-04T10:35:20.3314375Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3315152Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3315159Z 2025-12-04T10:35:20.3315396Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3315627Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3315740Z frames [('total', 1)] 2025-12-04T10:35:20.3315850Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3316326Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3316526Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3316614Z graph_break [] 2025-12-04T10:35:20.3316818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3316910Z frames [('total', 1)] 2025-12-04T10:35:20.3317025Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3317217Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3317621Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3317720Z graph_break [] 2025-12-04T10:35:20.3317911Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3318001Z frames [('total', 1)] 2025-12-04T10:35:20.3318110Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3318305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3318715Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3318807Z graph_break [] 2025-12-04T10:35:20.3319376Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.xml - 2025-12-04T10:35:20.3319581Z =========================== short test summary info ============================ 2025-12-04T10:35:20.3320319Z FAILED [0.4075s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3321071Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3321150Z ^ 2025-12-04T10:35:20.3321554Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3321559Z 2025-12-04T10:35:20.3322182Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3322189Z 2025-12-04T10:35:20.3322193Z 2025-12-04T10:35:20.3322422Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3323192Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3323200Z 2025-12-04T10:35:20.3323437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3323609Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.3323786Z ================== 1 failed, 30 deselected, 2 rerun in 2.74s =================== 2025-12-04T10:35:20.3323875Z Got exit code 1 2025-12-04T10:35:20.3323981Z Retrying single test... 2025-12-04T10:35:20.3324395Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.xml 2025-12-04T10:35:20.3324546Z ============================= test session starts ============================== 2025-12-04T10:35:20.3324869Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.3324970Z cachedir: .pytest_cache 2025-12-04T10:35:20.3325442Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.3325565Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.3325680Z configfile: pytest.ini 2025-12-04T10:35:20.3326236Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.3326437Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.3327128Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3327240Z Running 1 items in this shard 2025-12-04T10:35:20.3327248Z 2025-12-04T10:35:20.3328497Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3329588Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3329962Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3330364Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3330810Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3331276Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3331799Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3332311Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3332830Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3333312Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3333759Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3334315Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3334776Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3335263Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3335770Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3336239Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3336704Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3337128Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3337554Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3338005Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3338447Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3339154Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3339755Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3340351Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3340808Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3341240Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3341635Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3342062Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3342534Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3342949Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3343459Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3343884Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3344346Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3344859Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3345406Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3345946Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3346378Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3346784Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3347341Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3347890Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3348519Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3349135Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3349817Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3350479Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3351063Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3351710Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3352034Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3354292Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3354813Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3355770Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3356356Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3357134Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3357723Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3358533Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3359203Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3359733Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3360819Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3361137Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3361918Z E1204 10:24:33.677000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3362039Z ('RERUN', {'yellow': True}) [1.8746s] [100%] 2025-12-04T10:35:20.3363332Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3364411Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3364792Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3365181Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3365582Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3366064Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3366535Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3367046Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3367593Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3368073Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3368509Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3369062Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3369525Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3370001Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3370546Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3371014Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3371473Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3371909Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3372329Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3372737Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3373182Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3373837Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3374447Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3375091Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3375567Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3376031Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3376432Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3376871Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3377268Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3377704Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3378173Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3378601Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3379248Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3379765Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3380318Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3380809Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3381241Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3381648Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3382189Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3382598Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3383100Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3383576Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3384186Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3384687Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3385147Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3385782Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3386132Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3388446Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3388932Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3389834Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3390390Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3391170Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3391798Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3392603Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3393276Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3393818Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3394940Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3395269Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3396043Z E1204 10:24:34.115000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3396162Z ('RERUN', {'yellow': True}) [0.4061s] [100%] 2025-12-04T10:35:20.3397415Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3398487Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3398871Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3399302Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3399710Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3400175Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3400653Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3401174Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3401679Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3402166Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3402552Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3403107Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3403607Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3404084Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3404628Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3405086Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3405552Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3406024Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3406481Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3406893Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3407327Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3408862Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3409460Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3410185Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3410787Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3411255Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3411652Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3412226Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3412750Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3413241Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3413740Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3414299Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3414852Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3415377Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3415927Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3416411Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3416935Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3417334Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3417893Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3418291Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3418790Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3419318Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3419985Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3420493Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3420939Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3421556Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3421869Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3424161Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3424642Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3425544Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3426104Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3426875Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3427478Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3428243Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3428922Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3429492Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3430623Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3430943Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3431713Z E1204 10:24:34.527000 84157 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3431827Z FAILED [0.4099s] [100%] 2025-12-04T10:35:20.3431901Z 2025-12-04T10:35:20.3432034Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.3432398Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3432513Z Traceback (most recent call last): 2025-12-04T10:35:20.3432882Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3433106Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3433531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3433755Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3434216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3434393Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3434843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3434983Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3435458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3435796Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3436295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3436436Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3436858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3436969Z return self._compile_to_module() 2025-12-04T10:35:20.3437409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3437559Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3438020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3438140Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3438576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3438790Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3439300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3439416Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3439884Z File "/tmp/tmp9h6j786w/qu/cquawwa4gvnmawtegqtb2rddoexidaw7vi3dinwdgotdx3la65zw.py", line 137, in 2025-12-04T10:35:20.3440329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3440475Z kernel.precompile( 2025-12-04T10:35:20.3440955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3441062Z self._precompile_worker() 2025-12-04T10:35:20.3441593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3441752Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3442280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3442455Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3442887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3443110Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3443504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3443798Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3444008Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3444715Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3444803Z ^ 2025-12-04T10:35:20.3445204Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3445211Z 2025-12-04T10:35:20.3445888Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3445902Z 2025-12-04T10:35:20.3445907Z 2025-12-04T10:35:20.3446099Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3446913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3446918Z 2025-12-04T10:35:20.3447164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3447357Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3447460Z frames [('total', 1)] 2025-12-04T10:35:20.3447563Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3447975Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3448184Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3448272Z graph_break [] 2025-12-04T10:35:20.3448624Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3448742Z Traceback (most recent call last): 2025-12-04T10:35:20.3449112Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3449325Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3449750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3449970Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3450420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3450635Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3451082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3451262Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3451729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3452022Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3452473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3452603Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3453029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3453144Z return self._compile_to_module() 2025-12-04T10:35:20.3453613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3453758Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3454208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3454330Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3454761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3454969Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3455478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3455595Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3456074Z File "/tmp/tmptwfxiuzo/ql/cqlbwfpkcdzcclgwbwdzgvro532w3bgf2ppp6rnm3ybjssmmbl5x.py", line 137, in 2025-12-04T10:35:20.3456482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3456584Z kernel.precompile( 2025-12-04T10:35:20.3457076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3457180Z self._precompile_worker() 2025-12-04T10:35:20.3457750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3457907Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3458426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3458611Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3459004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3459269Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3459657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3459951Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3460158Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3460862Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3460946Z ^ 2025-12-04T10:35:20.3461354Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3461404Z 2025-12-04T10:35:20.3462027Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3462070Z 2025-12-04T10:35:20.3462074Z 2025-12-04T10:35:20.3462271Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3463031Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3463036Z 2025-12-04T10:35:20.3463283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3463475Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3463567Z frames [('total', 1)] 2025-12-04T10:35:20.3463680Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3464130Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3464341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3464434Z graph_break [] 2025-12-04T10:35:20.3464624Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3464733Z frames [('total', 1)] 2025-12-04T10:35:20.3464839Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3465036Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3465447Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3465536Z graph_break [] 2025-12-04T10:35:20.3465669Z =================================== FAILURES =================================== 2025-12-04T10:35:20.3466035Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3466149Z Traceback (most recent call last): 2025-12-04T10:35:20.3466523Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3466727Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3467155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3467382Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3467875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3468050Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3468491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3468619Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3469092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3469418Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3470017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3470205Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3470803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3470943Z return self._compile_to_module() 2025-12-04T10:35:20.3471424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3471619Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3472194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3472407Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3472976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3473296Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3473948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3474125Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3474688Z File "/tmp/tmpqq4fse1_/7m/c7mezmlt7pzyraubputsbizgi6je765fehqvh2onofegcssc3wez.py", line 137, in 2025-12-04T10:35:20.3475216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3475367Z kernel.precompile( 2025-12-04T10:35:20.3476103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3476265Z self._precompile_worker() 2025-12-04T10:35:20.3476988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3477195Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3477941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3478197Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3478771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3479082Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3479625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3480068Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3480338Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3481212Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3481294Z ^ 2025-12-04T10:35:20.3481768Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3481774Z 2025-12-04T10:35:20.3482400Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3482405Z 2025-12-04T10:35:20.3482409Z 2025-12-04T10:35:20.3482602Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3483368Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3483375Z 2025-12-04T10:35:20.3483609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3483801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3483897Z frames [('total', 1)] 2025-12-04T10:35:20.3484001Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3484415Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3484613Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3484699Z graph_break [] 2025-12-04T10:35:20.3484891Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3485149Z frames [('total', 1)] 2025-12-04T10:35:20.3485255Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3485456Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3485905Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3486001Z graph_break [] 2025-12-04T10:35:20.3486187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3486281Z frames [('total', 1)] 2025-12-04T10:35:20.3486389Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3486581Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3486983Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3487082Z graph_break [] 2025-12-04T10:35:20.3487696Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.xml - 2025-12-04T10:35:20.3487859Z =========================== short test summary info ============================ 2025-12-04T10:35:20.3488598Z FAILED [0.4099s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3489300Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3489386Z ^ 2025-12-04T10:35:20.3489787Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3489792Z 2025-12-04T10:35:20.3490425Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3490431Z 2025-12-04T10:35:20.3490435Z 2025-12-04T10:35:20.3490626Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3491395Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3491400Z 2025-12-04T10:35:20.3491678Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3491839Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.3492026Z ================== 1 failed, 187 deselected, 2 rerun in 2.73s ================== 2025-12-04T10:35:20.3492114Z Got exit code 1 2025-12-04T10:35:20.3492211Z Retrying single test... 2025-12-04T10:35:20.3492637Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.xml 2025-12-04T10:35:20.3492788Z ============================= test session starts ============================== 2025-12-04T10:35:20.3493103Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.3493202Z cachedir: .pytest_cache 2025-12-04T10:35:20.3493663Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.3493780Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.3493882Z configfile: pytest.ini 2025-12-04T10:35:20.3494354Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.3494563Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.3495252Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3495408Z Running 1 items in this shard 2025-12-04T10:35:20.3495413Z 2025-12-04T10:35:20.3496718Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3497845Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3498217Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3498650Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3499114Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3499582Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3500057Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3500562Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3501079Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3501562Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3501954Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3502515Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3503015Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3503494Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3503994Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3504456Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3504927Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3505358Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3505829Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3506231Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3506663Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3507327Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3508213Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3508896Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3509354Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3509782Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3510171Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3510659Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3511055Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3511476Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3511944Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3512372Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3512826Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3513348Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3513854Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3514349Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3514835Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3515242Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3515746Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3516144Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3516650Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3517120Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3517733Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3518239Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3518683Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3519303Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3519705Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3522000Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3522526Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3523449Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3523994Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3524769Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3525355Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3526168Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3526839Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3527412Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3528498Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3528822Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3529599Z E1204 10:24:44.404000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3529721Z ('RERUN', {'yellow': True}) [1.8979s] [100%] 2025-12-04T10:35:20.3530971Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3532137Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3532682Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3533297Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3533762Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3534405Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3535022Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3535761Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3536442Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3537105Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3537553Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3538216Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3538811Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3539557Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3540207Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3540828Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3541511Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3542077Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3542669Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3543215Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3543799Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3544706Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3545475Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3546300Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3546946Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3547661Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3548202Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3548906Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3549459Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3550015Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3550608Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3551126Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3551587Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3552094Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3552590Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3553074Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3553499Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3553896Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3554392Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3554787Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3555274Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3555824Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3556425Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3556914Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3557360Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3557957Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3558264Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3560503Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3561041Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3561934Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3562464Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3563268Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3563846Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3564605Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3565255Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3565828Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3566897Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3567205Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3568117Z E1204 10:24:44.841000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3568228Z ('RERUN', {'yellow': True}) [0.4056s] [100%] 2025-12-04T10:35:20.3569468Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.3570526Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3570892Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.3571272Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.3571663Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3572119Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3572619Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3573156Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3573652Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3574124Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3574504Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3575076Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3575532Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.3576044Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.3576543Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3576988Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3577538Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3577962Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3578370Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3578772Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.3579270Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.3579961Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3580551Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3581137Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.3581587Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3581998Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.3582391Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.3582804Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.3583183Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.3583597Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.3584093Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.3584577Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.3585018Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.3585544Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3586073Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.3586593Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.3587026Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.3587425Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.3587913Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.3588302Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.3588787Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.3589247Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.3589850Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.3590347Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.3590827Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.3591426Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.3591735Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3593975Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3594441Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3595336Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3595920Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3596723Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3597302Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3598055Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3598759Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3599285Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3600356Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3600675Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3601440Z E1204 10:24:45.247000 84359 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3601539Z FAILED [0.4046s] [100%] 2025-12-04T10:35:20.3601544Z 2025-12-04T10:35:20.3601664Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.3602007Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3602119Z Traceback (most recent call last): 2025-12-04T10:35:20.3602516Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3602725Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3603142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3603350Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3603793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3603956Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3604386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3604520Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3604975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3605254Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3605699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3605820Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3606236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3606385Z return self._compile_to_module() 2025-12-04T10:35:20.3606805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3606980Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3607414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3607528Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3608226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3608426Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3608938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3609049Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3609592Z File "/tmp/tmphfo6cmb_/bz/cbz4aj6wg7oljizcrxvnda3ihrmadpwgczxt5ktckd5lv6bdm6rc.py", line 137, in 2025-12-04T10:35:20.3609987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3610077Z kernel.precompile( 2025-12-04T10:35:20.3610553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3610649Z self._precompile_worker() 2025-12-04T10:35:20.3611162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3611315Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3611827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3612007Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3612392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3612607Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3612979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3613338Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3613536Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3614226Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3614296Z ^ 2025-12-04T10:35:20.3614698Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3614705Z 2025-12-04T10:35:20.3615312Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3615318Z 2025-12-04T10:35:20.3615322Z 2025-12-04T10:35:20.3615513Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3616262Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3616267Z 2025-12-04T10:35:20.3616502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3616684Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3616766Z frames [('total', 1)] 2025-12-04T10:35:20.3616926Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3617332Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3617532Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3617669Z graph_break [] 2025-12-04T10:35:20.3618010Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3618114Z Traceback (most recent call last): 2025-12-04T10:35:20.3618474Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3618668Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3619129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3619345Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3620455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3620623Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3626234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3626384Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3626854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3627134Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3627589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3627717Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3628140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3628245Z return self._compile_to_module() 2025-12-04T10:35:20.3628662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3628810Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3629253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3629438Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3629860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3630058Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3630571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3630681Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3631124Z File "/tmp/tmprzfkhit7/bc/cbcqw6tefexgqfmhlfmsm35v27raw3lgjizd6ai4q4vwem62jst7.py", line 137, in 2025-12-04T10:35:20.3631527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3631617Z kernel.precompile( 2025-12-04T10:35:20.3632111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3632215Z self._precompile_worker() 2025-12-04T10:35:20.3632725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3632884Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3633390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3633611Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3633994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3634241Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3634622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3634916Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3635111Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3635812Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3635891Z ^ 2025-12-04T10:35:20.3636332Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3636339Z 2025-12-04T10:35:20.3636945Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3636952Z 2025-12-04T10:35:20.3636956Z 2025-12-04T10:35:20.3637148Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3637901Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3637907Z 2025-12-04T10:35:20.3638132Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3638330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3638420Z frames [('total', 1)] 2025-12-04T10:35:20.3638527Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3638931Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3639120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3639210Z graph_break [] 2025-12-04T10:35:20.3639395Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3639478Z frames [('total', 1)] 2025-12-04T10:35:20.3639628Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3639812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3640208Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3640292Z graph_break [] 2025-12-04T10:35:20.3640413Z =================================== FAILURES =================================== 2025-12-04T10:35:20.3640766Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T10:35:20.3640870Z Traceback (most recent call last): 2025-12-04T10:35:20.3641229Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3641435Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3641849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3642070Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3642506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3642669Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3643113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3643281Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3643740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3644051Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3644491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3644624Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3645031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3645131Z return self._compile_to_module() 2025-12-04T10:35:20.3645551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3645689Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3646176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3646285Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3646711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3646911Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3647411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3647524Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3647945Z File "/tmp/tmp9qv8_m5o/ms/cmsnsj7uefdv2k4uimmgbctlqtbmhqvsjbc764nvmryqbe73lbvq.py", line 137, in 2025-12-04T10:35:20.3648339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3648443Z kernel.precompile( 2025-12-04T10:35:20.3648917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3649017Z self._precompile_worker() 2025-12-04T10:35:20.3649531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3649682Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3650243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3650410Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3650791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3651008Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3651385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3651675Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3651870Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3652563Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3652648Z ^ 2025-12-04T10:35:20.3653036Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3653042Z 2025-12-04T10:35:20.3653656Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3653704Z 2025-12-04T10:35:20.3653709Z 2025-12-04T10:35:20.3653894Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3654639Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3654718Z 2025-12-04T10:35:20.3654947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3655132Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3655226Z frames [('total', 1)] 2025-12-04T10:35:20.3655326Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3655724Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3655919Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3656004Z graph_break [] 2025-12-04T10:35:20.3656237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3656324Z frames [('total', 1)] 2025-12-04T10:35:20.3656420Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3656616Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3657013Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3657098Z graph_break [] 2025-12-04T10:35:20.3657290Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3657375Z frames [('total', 1)] 2025-12-04T10:35:20.3657470Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3657665Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3658062Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.3658154Z graph_break [] 2025-12-04T10:35:20.3658717Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.xml - 2025-12-04T10:35:20.3658866Z =========================== short test summary info ============================ 2025-12-04T10:35:20.3659673Z FAILED [0.4046s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3660407Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3660489Z ^ 2025-12-04T10:35:20.3660881Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3660889Z 2025-12-04T10:35:20.3661495Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3661507Z 2025-12-04T10:35:20.3661513Z 2025-12-04T10:35:20.3661697Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3662443Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3662448Z 2025-12-04T10:35:20.3662682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3662834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.3663011Z ================== 1 failed, 187 deselected, 2 rerun in 2.74s ================== 2025-12-04T10:35:20.3663098Z Got exit code 1 2025-12-04T10:35:20.3663680Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T10:35:20.3664045Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.3664494Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.xml 2025-12-04T10:35:20.3664628Z ============================= test session starts ============================== 2025-12-04T10:35:20.3664933Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.3665025Z cachedir: .pytest_cache 2025-12-04T10:35:20.3665477Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.3665580Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.3665670Z configfile: pytest.ini 2025-12-04T10:35:20.3666183Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.3666380Z collecting ... collected 188 items / 31 deselected / 157 selected 2025-12-04T10:35:20.3666504Z stepcurrent: skipping 31 already run items. 2025-12-04T10:35:20.3666609Z Running 157 items in this shard 2025-12-04T10:35:20.3666614Z 2025-12-04T10:35:20.3667789Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.3668722Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3669099Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.3669485Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.3669874Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3670377Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3670844Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3671337Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3671837Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3672309Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3672686Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3673055Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.3673559Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3674066Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3674591Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3675136Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3675633Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3676091Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3676519Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3676922Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3677334Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3678048Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3678498Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3679007Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3679621Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.3680158Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.3680504Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.3681040Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.3681538Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.3682128Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.3682753Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.3683166Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.3683598Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.3684011Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.3684552Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3685021Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.3685488Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.3685986Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3686479Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3686969Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3687392Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3687804Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3688216Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3688926Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3689391Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.3689825Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.3690214Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.3690650Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.3691035Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.3691474Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.3691948Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.3692375Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.3692836Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.3693381Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3693895Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.3694368Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.3694799Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.3695207Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.3695725Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.3696165Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.3696657Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.3697126Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.3697640Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.3698197Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.3698707Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.3699017Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3701135Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3701595Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3702497Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3703032Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3703793Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3704386Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3705144Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3705853Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3706376Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3707329Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3707638Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3708657Z E1204 10:24:54.955000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3708772Z ('RERUN', {'yellow': True}) [1.7831s] [ 0%] 2025-12-04T10:35:20.3709938Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.3710955Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3711393Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.3711785Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.3712180Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3712643Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3713190Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3713685Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3714193Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3714667Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3715067Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3715452Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.3715980Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3716481Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3716995Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3717550Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3718003Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3718450Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3718872Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3719286Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3719698Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3720355Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3720796Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3721299Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3721907Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.3722467Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.3722845Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.3723375Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.3723871Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.3724418Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.3725062Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.3725474Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.3725887Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.3726289Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.3726823Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3727275Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.3727748Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.3728250Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3728701Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3729192Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3729618Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3730018Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3730421Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3731084Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3731540Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.3731959Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.3732349Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.3732777Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.3733208Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.3733636Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.3734131Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.3734549Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.3735006Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.3735511Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3736101Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.3736584Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.3737006Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.3737407Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.3737894Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.3738285Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.3738776Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.3739287Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.3739800Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.3740360Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.3740835Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.3741136Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3743150Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3743608Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3744508Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3745093Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3745936Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3746527Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3747279Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3748068Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3748590Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3749528Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3749840Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3750609Z E1204 10:24:55.327000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3750722Z ('RERUN', {'yellow': True}) [0.3392s] [ 0%] 2025-12-04T10:35:20.3751886Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.3752989Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3753370Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.3753760Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.3754145Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3754606Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3755071Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3755576Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3756122Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3756590Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3756972Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3757380Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.3757882Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3758422Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3758934Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3759424Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3759870Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3760355Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3760776Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3761181Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3761577Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3762229Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3762669Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3763173Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3763784Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.3764340Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.3764677Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.3765197Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.3765744Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.3766289Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.3766983Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.3767394Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.3767802Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.3768199Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.3768787Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3769240Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.3769742Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.3770242Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3770687Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3771137Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3771592Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3771999Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3772401Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3773060Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3773514Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.3773932Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.3774321Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.3774746Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.3775133Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.3775626Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.3776108Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.3776526Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.3776972Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.3777478Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3777979Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.3778452Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.3778870Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.3779333Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.3779825Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.3780260Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.3780782Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.3781236Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.3781746Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.3782237Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.3782768Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.3783079Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3785083Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3785564Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3786487Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3787016Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3787818Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3788397Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3789145Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3789808Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3790329Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3791259Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3791565Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3792372Z E1204 10:24:55.667000 84561 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3792496Z FAILED [0.3384s] [ 0%] 2025-12-04T10:35:20.3792501Z 2025-12-04T10:35:20.3792619Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.3792969Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.3793076Z Traceback (most recent call last): 2025-12-04T10:35:20.3793443Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3793642Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3794058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3794319Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3794761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3794926Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3795362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3795483Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3795946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3796222Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3796665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3796799Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3797209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3797316Z return self._compile_to_module() 2025-12-04T10:35:20.3797733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3797870Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3798371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3798479Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3798899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3799099Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3799598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3799714Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3800137Z File "/tmp/tmpzjmaw_kz/yw/cyw64nfiorcf2siwfkmktiivuijku7y4kmp6tsf54uxbkikimb66.py", line 65, in 2025-12-04T10:35:20.3800539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3800644Z kernel.precompile( 2025-12-04T10:35:20.3801118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3801218Z self._precompile_worker() 2025-12-04T10:35:20.3801722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3801869Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3802429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3802598Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3803021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3803234Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3803608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3803899Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3804091Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3804641Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3804723Z ^ 2025-12-04T10:35:20.3805156Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3805162Z 2025-12-04T10:35:20.3805812Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3805818Z 2025-12-04T10:35:20.3805823Z 2025-12-04T10:35:20.3806018Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3806789Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.3806794Z 2025-12-04T10:35:20.3807023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3807205Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3807302Z frames [('total', 1)] 2025-12-04T10:35:20.3807397Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3808036Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3808233Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3808313Z graph_break [] 2025-12-04T10:35:20.3808663Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.3808843Z Traceback (most recent call last): 2025-12-04T10:35:20.3809204Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3809400Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3809817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3810032Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3810473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3810638Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3811078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3811196Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3811650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3811926Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3812368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3812556Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3812965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3813061Z return self._compile_to_module() 2025-12-04T10:35:20.3813532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3813666Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3814103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3814215Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3814631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3814834Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3815388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3815498Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3815979Z File "/tmp/tmpjujnynkf/tu/ctuoy5iuboo2w6ka63qwlilernpnne76wje6f3sicc5s5ry6t4rs.py", line 65, in 2025-12-04T10:35:20.3816379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3816476Z kernel.precompile( 2025-12-04T10:35:20.3816954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3817056Z self._precompile_worker() 2025-12-04T10:35:20.3817573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3817718Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3818246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3818416Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3818796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3819011Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3819478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3819762Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3819966Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3820523Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3820598Z ^ 2025-12-04T10:35:20.3820992Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3820997Z 2025-12-04T10:35:20.3821605Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3821618Z 2025-12-04T10:35:20.3821622Z 2025-12-04T10:35:20.3821806Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3822568Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.3822573Z 2025-12-04T10:35:20.3822807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3822990Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3823130Z frames [('total', 1)] 2025-12-04T10:35:20.3823232Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3823636Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3823904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3823983Z graph_break [] 2025-12-04T10:35:20.3824164Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3824263Z frames [('total', 1)] 2025-12-04T10:35:20.3824366Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3824556Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3824959Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3825040Z graph_break [] 2025-12-04T10:35:20.3825176Z =================================== FAILURES =================================== 2025-12-04T10:35:20.3825594Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.3825714Z Traceback (most recent call last): 2025-12-04T10:35:20.3826111Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3826306Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3826737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3826947Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3827385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3827565Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3828002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3828124Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3828591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3828867Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3829364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3829491Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3829903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3830017Z return self._compile_to_module() 2025-12-04T10:35:20.3830435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3830588Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3831035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3831142Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3831572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3831769Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3832271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3832379Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3832823Z File "/tmp/tmpgsikz3dy/jp/cjptyxbztdc4hx6s5p4yoya4vwahzgybnb33pe44qoussrewtnpv.py", line 65, in 2025-12-04T10:35:20.3833273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3833366Z kernel.precompile( 2025-12-04T10:35:20.3833841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3833981Z self._precompile_worker() 2025-12-04T10:35:20.3834487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3834646Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3835155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3835323Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3835763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3836012Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3836396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3836698Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3836893Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3837460Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3837531Z ^ 2025-12-04T10:35:20.3837928Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3837933Z 2025-12-04T10:35:20.3838546Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3838556Z 2025-12-04T10:35:20.3838560Z 2025-12-04T10:35:20.3838749Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3839513Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.3839521Z 2025-12-04T10:35:20.3839750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3839990Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3840079Z frames [('total', 1)] 2025-12-04T10:35:20.3840175Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3840598Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3840792Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3840876Z graph_break [] 2025-12-04T10:35:20.3841068Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3841155Z frames [('total', 1)] 2025-12-04T10:35:20.3841262Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3841444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3841846Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3841934Z graph_break [] 2025-12-04T10:35:20.3842121Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3842209Z frames [('total', 1)] 2025-12-04T10:35:20.3842313Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3842502Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3842897Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3843027Z graph_break [] 2025-12-04T10:35:20.3843581Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.xml - 2025-12-04T10:35:20.3843775Z =========================== short test summary info ============================ 2025-12-04T10:35:20.3844524Z FAILED [0.3384s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3845087Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3845163Z ^ 2025-12-04T10:35:20.3845558Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3845567Z 2025-12-04T10:35:20.3846229Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3846235Z 2025-12-04T10:35:20.3846243Z 2025-12-04T10:35:20.3846434Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3847204Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.3847211Z 2025-12-04T10:35:20.3847434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3847582Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.3847767Z ================== 1 failed, 31 deselected, 2 rerun in 2.49s =================== 2025-12-04T10:35:20.3847854Z Got exit code 1 2025-12-04T10:35:20.3847956Z Retrying single test... 2025-12-04T10:35:20.3848364Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.xml 2025-12-04T10:35:20.3848500Z ============================= test session starts ============================== 2025-12-04T10:35:20.3848809Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.3848902Z cachedir: .pytest_cache 2025-12-04T10:35:20.3849392Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.3849503Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.3849593Z configfile: pytest.ini 2025-12-04T10:35:20.3850064Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.3850253Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.3850949Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.3851053Z Running 1 items in this shard 2025-12-04T10:35:20.3851059Z 2025-12-04T10:35:20.3852226Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.3853168Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3853550Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.3853996Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.3854384Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3854880Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3855351Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3855899Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3856410Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3856918Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3857294Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3857664Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.3858249Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3858757Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3859302Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3859792Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3860259Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3860704Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3861183Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3861593Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3861984Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3862654Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3863109Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3863615Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3864228Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.3864754Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.3865092Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.3865699Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.3866277Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.3866828Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.3867429Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.3867838Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.3868277Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.3868683Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.3869221Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3869684Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.3870155Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.3870659Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3871107Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3871556Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3872059Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3872591Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3873173Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3873837Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3874286Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.3874710Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.3875100Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.3875529Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.3875913Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.3876329Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.3876834Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.3877303Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.3877748Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.3878288Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3878780Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.3879257Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.3879673Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.3880107Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.3880593Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.3880981Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.3881467Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.3881920Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.3882422Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.3882913Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.3883379Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.3883681Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3885817Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3886273Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3887169Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3887705Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3888458Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3889084Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3889868Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3890525Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3891043Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3892012Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3892320Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3893079Z E1204 10:25:05.668000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3893193Z ('RERUN', {'yellow': True}) [1.8022s] [100%] 2025-12-04T10:35:20.3894357Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.3895282Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3895672Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.3896092Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.3896517Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3896965Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3897425Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3897917Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3898412Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3898879Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3899315Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3899680Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.3900181Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3900684Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3901243Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3901768Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3902257Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3902704Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3903123Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3903565Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3903970Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3904628Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3905072Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3905578Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3906180Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.3906707Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.3907047Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.3907565Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.3908391Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.3908944Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.3909544Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.3909948Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.3910357Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.3910750Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.3911284Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3911735Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.3912196Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.3912747Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3913249Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3913695Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3914113Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3914511Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3914902Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3915638Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3916113Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.3916539Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.3916928Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.3917361Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.3917742Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.3918168Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.3918628Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.3919051Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.3919541Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.3920043Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3920536Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.3921013Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.3921429Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.3921825Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.3922309Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.3922697Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.3923184Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.3923689Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.3924200Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.3924754Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.3925219Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.3925549Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3927623Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3928083Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3928978Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3929513Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3930268Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3936275Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3937135Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3937802Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3938332Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3939326Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3939644Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3940406Z E1204 10:25:06.041000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3940526Z ('RERUN', {'yellow': True}) [0.3409s] [100%] 2025-12-04T10:35:20.3941696Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.3942712Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3943087Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.3943479Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.3943863Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.3944359Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.3944829Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.3945320Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.3945878Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.3946345Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.3946724Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.3947099Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.3947603Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3948109Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3948668Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.3949159Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3949614Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3950067Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3950490Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3950896Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3951291Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3951947Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3952396Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.3952903Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3953553Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.3954107Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.3954451Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.3954968Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.3955470Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.3956059Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.3956660Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.3957066Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.3957468Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.3957862Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.3958397Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.3958854Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.3959316Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.3959811Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.3960306Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.3960751Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.3961168Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.3961579Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.3961977Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.3962637Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.3963087Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.3963515Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.3963901Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.3964375Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.3964762Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.3965221Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.3965682Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.3966152Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.3966599Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.3967142Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.3967646Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.3968119Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.3968546Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.3968947Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.3969434Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.3969835Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.3970321Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.3970777Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.3971328Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.3971816Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.3972286Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.3972593Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.3974611Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.3975065Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.3976051Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3976644Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3977402Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3977990Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3978779Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3979519Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3980044Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.3980980Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3981290Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.3982050Z E1204 10:25:06.383000 84742 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3982143Z FAILED [0.3398s] [100%] 2025-12-04T10:35:20.3982149Z 2025-12-04T10:35:20.3982268Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.3982623Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.3982736Z Traceback (most recent call last): 2025-12-04T10:35:20.3983164Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3983371Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3983788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3984014Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.3984459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.3984623Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.3985070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.3985195Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.3985675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.3985987Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.3986430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.3986566Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.3986973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.3987117Z return self._compile_to_module() 2025-12-04T10:35:20.3987537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.3987721Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.3988171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.3988281Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.3988701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.3988908Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.3989410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.3989561Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.3990015Z File "/tmp/tmppumwox5z/ga/cgaga4fcmswxmfr4dvripvwppumijpm34xl47zyazd2lbidr63sr.py", line 65, in 2025-12-04T10:35:20.3990414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.3990521Z kernel.precompile( 2025-12-04T10:35:20.3990998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.3991098Z self._precompile_worker() 2025-12-04T10:35:20.3991620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.3991775Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.3992289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.3992463Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.3992845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.3993064Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.3993441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.3993778Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.3993974Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.3994525Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.3994605Z ^ 2025-12-04T10:35:20.3995002Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.3995012Z 2025-12-04T10:35:20.3995616Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.3995630Z 2025-12-04T10:35:20.3995634Z 2025-12-04T10:35:20.3995817Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.3996585Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.3996590Z 2025-12-04T10:35:20.3996821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.3997003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.3997094Z frames [('total', 1)] 2025-12-04T10:35:20.3997234Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.3997638Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.3997829Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.3997951Z graph_break [] 2025-12-04T10:35:20.3998298Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.3998404Z Traceback (most recent call last): 2025-12-04T10:35:20.3998768Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.3998967Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.3999381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.3999593Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4000078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4000240Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4000680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4000800Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4001255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4001535Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4001986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4002108Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4002526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4002629Z return self._compile_to_module() 2025-12-04T10:35:20.4003062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4003207Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4003653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4003817Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4004242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4004455Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4004962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4005073Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4005527Z File "/tmp/tmpprtorvna/bf/cbfnpwqlaszmm75ijj7mv2mu6lpsnqq6wq6dnli45nfcnss3ezsk.py", line 65, in 2025-12-04T10:35:20.4005970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4006077Z kernel.precompile( 2025-12-04T10:35:20.4006564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4006668Z self._precompile_worker() 2025-12-04T10:35:20.4007186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4007339Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4008127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4008389Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4008773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4009043Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4009423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4009708Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4009915Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4010472Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4010546Z ^ 2025-12-04T10:35:20.4010994Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4011002Z 2025-12-04T10:35:20.4011607Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4011615Z 2025-12-04T10:35:20.4011620Z 2025-12-04T10:35:20.4011808Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4012570Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4012575Z 2025-12-04T10:35:20.4012809Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4012992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4013079Z frames [('total', 1)] 2025-12-04T10:35:20.4013189Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4013597Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4013796Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4013881Z graph_break [] 2025-12-04T10:35:20.4014065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4014161Z frames [('total', 1)] 2025-12-04T10:35:20.4014257Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4014507Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4014909Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4014992Z graph_break [] 2025-12-04T10:35:20.4015117Z =================================== FAILURES =================================== 2025-12-04T10:35:20.4015478Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.4015582Z Traceback (most recent call last): 2025-12-04T10:35:20.4015999Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4016199Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4016610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4016829Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4017267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4017441Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4017879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4018073Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4018537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4018849Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4019344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4019470Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4019879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4019985Z return self._compile_to_module() 2025-12-04T10:35:20.4020397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4020533Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4021021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4021130Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4021558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4021754Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4022251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4022364Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4022802Z File "/tmp/tmpj0g4eowh/ms/cmsbtdxe5kc65vlifejvtsqxlhqyiibj4nnpcuvakd7bzcw4xh6y.py", line 65, in 2025-12-04T10:35:20.4023201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4023293Z kernel.precompile( 2025-12-04T10:35:20.4023764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4023862Z self._precompile_worker() 2025-12-04T10:35:20.4024368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4024516Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4025075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4025241Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4025653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4025908Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4026399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4026698Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4026892Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4027459Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4027529Z ^ 2025-12-04T10:35:20.4027921Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4027927Z 2025-12-04T10:35:20.4028540Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4028545Z 2025-12-04T10:35:20.4028606Z 2025-12-04T10:35:20.4028788Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4029554Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4029602Z 2025-12-04T10:35:20.4029828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4030010Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4030103Z frames [('total', 1)] 2025-12-04T10:35:20.4030198Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4030604Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4030787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4030868Z graph_break [] 2025-12-04T10:35:20.4031054Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4031178Z frames [('total', 1)] 2025-12-04T10:35:20.4031274Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4031459Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4031856Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4031944Z graph_break [] 2025-12-04T10:35:20.4032122Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4032204Z frames [('total', 1)] 2025-12-04T10:35:20.4032302Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4032483Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4032877Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4032965Z graph_break [] 2025-12-04T10:35:20.4033526Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.xml - 2025-12-04T10:35:20.4033675Z =========================== short test summary info ============================ 2025-12-04T10:35:20.4034415Z FAILED [0.3398s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4035009Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4035088Z ^ 2025-12-04T10:35:20.4035478Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4035482Z 2025-12-04T10:35:20.4036096Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4036104Z 2025-12-04T10:35:20.4036108Z 2025-12-04T10:35:20.4036290Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4037046Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4037056Z 2025-12-04T10:35:20.4037282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4037437Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.4037619Z ================== 1 failed, 187 deselected, 2 rerun in 2.52s ================== 2025-12-04T10:35:20.4037699Z Got exit code 1 2025-12-04T10:35:20.4037790Z Retrying single test... 2025-12-04T10:35:20.4038202Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.xml 2025-12-04T10:35:20.4038388Z ============================= test session starts ============================== 2025-12-04T10:35:20.4038689Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.4038818Z cachedir: .pytest_cache 2025-12-04T10:35:20.4039263Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.4039373Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.4039468Z configfile: pytest.ini 2025-12-04T10:35:20.4040039Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.4040242Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.4040932Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4041083Z Running 1 items in this shard 2025-12-04T10:35:20.4041088Z 2025-12-04T10:35:20.4042253Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.4043198Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4043579Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.4043964Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.4044367Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4044819Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4045292Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4045828Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4046328Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4046807Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4047192Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.4047567Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.4048076Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4048583Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4049095Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4049582Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.4050089Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.4050578Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4051001Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4051413Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4051809Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.4052482Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.4052974Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4053487Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4054098Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.4054604Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.4054948Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.4055467Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.4055977Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.4056523Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.4057191Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.4057599Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.4058000Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.4058413Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.4058950Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.4059461Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4059934Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.4060423Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.4060885Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.4061379Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4061797Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4062238Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4062630Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.4063293Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.4063741Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4064207Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.4064592Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.4065025Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.4065421Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.4065851Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.4066315Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.4066743Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.4067204Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.4067715Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4068249Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.4068742Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.4069169Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.4069572Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.4070065Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.4070455Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.4070953Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.4071414Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.4071927Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.4072419Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.4072928Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.4073292Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4075340Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4075811Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4076708Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4077255Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4078013Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4078609Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4079366Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4080043Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4080610Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4081544Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4081871Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4082635Z E1204 10:25:16.348000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4082758Z ('RERUN', {'yellow': True}) [1.8046s] [100%] 2025-12-04T10:35:20.4083923Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.4084863Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4085279Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.4085748Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.4086153Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4086607Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4087072Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4087565Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4088096Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4088585Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4088971Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.4089344Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.4089848Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4090361Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4090873Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4091362Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.4091827Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.4092321Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4092746Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4093151Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4093546Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.4094216Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.4094663Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4095168Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4095788Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.4096346Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.4096724Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.4097286Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.4097789Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.4098331Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.4098939Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.4099457Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.4099859Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.4100273Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.4100812Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.4101264Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4101734Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.4102228Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.4102690Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.4103140Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4103597Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4103998Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4104403Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.4105073Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.4105521Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4105996Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.4106381Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.4106808Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.4107191Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.4107608Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.4108410Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.4108900Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.4109346Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.4109846Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4110335Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.4110816Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.4111293Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.4111698Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.4112182Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.4112570Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.4113059Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.4113514Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.4114029Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.4114515Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.4114980Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.4115343Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4117352Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4117820Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4118706Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4119241Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4120050Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4120665Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4121417Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4122073Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4122587Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4123562Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4123874Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4124636Z E1204 10:25:16.723000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4124757Z ('RERUN', {'yellow': True}) [0.3420s] [100%] 2025-12-04T10:35:20.4125963Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.4126972Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4127345Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.4127767Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.4128167Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4128612Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4129077Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4129566Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4130071Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4130543Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4130917Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.4131284Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.4131793Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4132339Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4132895Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.4133384Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.4133841Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.4134290Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4134751Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4135162Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4135558Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.4136273Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.4136720Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4137229Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4137841Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.4138361Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.4138701Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.4139314Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.4139815Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.4140364Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.4140968Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.4141371Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.4141777Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.4142181Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.4142713Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.4143165Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4143670Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.4144221Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.4144678Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.4145120Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4145535Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4145979Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4146384Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.4147046Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.4147493Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4147919Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.4148305Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.4148737Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.4149124Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.4149547Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.4150005Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.4150468Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.4150914Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.4151413Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4151909Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.4152389Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.4152809Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.4153208Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.4153693Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.4154073Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.4154606Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.4155060Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.4155607Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.4156099Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.4156578Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.4156880Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4158925Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4159403Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4160299Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4160845Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4161604Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4162226Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4162987Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4163655Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4164180Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4165122Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4165432Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4166245Z E1204 10:25:17.065000 84923 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4166384Z FAILED [0.3404s] [100%] 2025-12-04T10:35:20.4166391Z 2025-12-04T10:35:20.4166515Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.4166882Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.4167023Z Traceback (most recent call last): 2025-12-04T10:35:20.4167385Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4167606Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4168024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4168241Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4168697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4168907Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4169348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4169476Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4169939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4170227Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4170687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4170820Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4171229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4171333Z return self._compile_to_module() 2025-12-04T10:35:20.4171754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4171892Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4172331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4172445Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4172914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4173127Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4173629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4173736Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4174192Z File "/tmp/tmphkws7to1/bu/cbud6absdf4pp2bsbjheogcffjal6tennyacnnfpntpy72bgetgq.py", line 65, in 2025-12-04T10:35:20.4174596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4174698Z kernel.precompile( 2025-12-04T10:35:20.4175178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4175280Z self._precompile_worker() 2025-12-04T10:35:20.4175806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4175958Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4176473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4176649Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4177077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4177293Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4177712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4178002Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4178212Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4178776Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4178856Z ^ 2025-12-04T10:35:20.4179339Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4179344Z 2025-12-04T10:35:20.4180012Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4180017Z 2025-12-04T10:35:20.4180031Z 2025-12-04T10:35:20.4180226Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4180985Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4180993Z 2025-12-04T10:35:20.4181229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4181410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4181496Z frames [('total', 1)] 2025-12-04T10:35:20.4181601Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4182005Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4182212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4182293Z graph_break [] 2025-12-04T10:35:20.4182639Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.4182749Z Traceback (most recent call last): 2025-12-04T10:35:20.4183107Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4183354Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4183778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4183994Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4184436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4184600Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4185032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4185162Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4185635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4185946Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4186384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4186501Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4186911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4187085Z return self._compile_to_module() 2025-12-04T10:35:20.4187504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4187648Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4188129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4188246Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4188673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4188865Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4189368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4189477Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4189958Z File "/tmp/tmpv892o071/ck/cckds2hb4vv22vzh6yfutpmfmk47yxrogjxbfgt63xscwtwn6k52.py", line 65, in 2025-12-04T10:35:20.4190353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4190445Z kernel.precompile( 2025-12-04T10:35:20.4190922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4191020Z self._precompile_worker() 2025-12-04T10:35:20.4191533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4191682Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4192185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4192354Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4192742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4192944Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4193329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4193610Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4193856Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4194411Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4194481Z ^ 2025-12-04T10:35:20.4194873Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4194880Z 2025-12-04T10:35:20.4195491Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4195496Z 2025-12-04T10:35:20.4195502Z 2025-12-04T10:35:20.4195711Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4196491Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4196499Z 2025-12-04T10:35:20.4196728Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4196906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4196991Z frames [('total', 1)] 2025-12-04T10:35:20.4197097Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4197497Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4197726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4197807Z graph_break [] 2025-12-04T10:35:20.4198021Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4198107Z frames [('total', 1)] 2025-12-04T10:35:20.4198205Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4198386Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4198786Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4198865Z graph_break [] 2025-12-04T10:35:20.4198987Z =================================== FAILURES =================================== 2025-12-04T10:35:20.4199349Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.4199450Z Traceback (most recent call last): 2025-12-04T10:35:20.4199857Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4200051Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4200469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4200679Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4201115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4201274Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4201714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4201840Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4202302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4202573Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4203019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4203147Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4203594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4203696Z return self._compile_to_module() 2025-12-04T10:35:20.4204106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4204242Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4204685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4204794Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4205215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4205413Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4205912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4206020Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4206451Z File "/tmp/tmpqnqivahn/qs/cqs7xfaba5od4xvdspjvom257aczm2wrow2ufllaxquibskunm4a.py", line 65, in 2025-12-04T10:35:20.4206840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4206938Z kernel.precompile( 2025-12-04T10:35:20.4207407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4207547Z self._precompile_worker() 2025-12-04T10:35:20.4208265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4208484Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4208993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4209159Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4209537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4209743Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4210113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4210474Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4210669Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4211225Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4211303Z ^ 2025-12-04T10:35:20.4211694Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4211699Z 2025-12-04T10:35:20.4212318Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4212323Z 2025-12-04T10:35:20.4212327Z 2025-12-04T10:35:20.4212509Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4213288Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4213293Z 2025-12-04T10:35:20.4213529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4213708Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4213798Z frames [('total', 1)] 2025-12-04T10:35:20.4213896Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4214352Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4214556Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4214639Z graph_break [] 2025-12-04T10:35:20.4214835Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4214925Z frames [('total', 1)] 2025-12-04T10:35:20.4215027Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4215214Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4215609Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4215687Z graph_break [] 2025-12-04T10:35:20.4215874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4215957Z frames [('total', 1)] 2025-12-04T10:35:20.4216056Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4216241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4216637Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4216720Z graph_break [] 2025-12-04T10:35:20.4217275Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.xml - 2025-12-04T10:35:20.4217487Z =========================== short test summary info ============================ 2025-12-04T10:35:20.4218222Z FAILED [0.3404s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4218815Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.4218896Z ^ 2025-12-04T10:35:20.4219330Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4219336Z 2025-12-04T10:35:20.4219942Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4219955Z 2025-12-04T10:35:20.4219959Z 2025-12-04T10:35:20.4220183Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4220945Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4220952Z 2025-12-04T10:35:20.4221188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4221349Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.4221524Z ================== 1 failed, 187 deselected, 2 rerun in 2.52s ================== 2025-12-04T10:35:20.4221606Z Got exit code 1 2025-12-04T10:35:20.4222158Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T10:35:20.4222511Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.4222920Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.xml 2025-12-04T10:35:20.4223058Z ============================= test session starts ============================== 2025-12-04T10:35:20.4223357Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.4223449Z cachedir: .pytest_cache 2025-12-04T10:35:20.4223967Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.4224070Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.4224160Z configfile: pytest.ini 2025-12-04T10:35:20.4224631Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.4224826Z collecting ... collected 188 items / 32 deselected / 156 selected 2025-12-04T10:35:20.4224964Z stepcurrent: skipping 32 already run items. 2025-12-04T10:35:20.4225060Z Running 156 items in this shard 2025-12-04T10:35:20.4225066Z 2025-12-04T10:35:20.4226297Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4227244Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4227608Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4227998Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4228480Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4228911Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4233795Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4234288Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4234794Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4235294Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4235840Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4236215Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4236661Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4237070Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4237461Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4237844Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4238391Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4238842Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4239320Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4239795Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4240297Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4240749Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4241244Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4241704Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4242185Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4242647Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4243081Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4243495Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4243908Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4244356Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4244903Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4245360Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4245906Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4246304Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4246675Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4247144Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4247515Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4247935Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4248383Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4248783Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4249217Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4249712Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4250216Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4250761Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4251217Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4251610Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4252098Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4252477Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4252966Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4253418Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4253866Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4254462Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4255072Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4255430Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4257290Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4257797Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4258737Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4259361Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4260134Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4260732Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4261482Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4262155Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4262678Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4263669Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4263981Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4264747Z E1204 10:25:26.963000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4264876Z ('RERUN', {'yellow': True}) [1.7478s] [ 0%] 2025-12-04T10:35:20.4266103Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4267051Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4267415Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4267807Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4268317Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4268752Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4269224Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4269688Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4270209Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4270707Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4271256Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4271643Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4272092Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4272512Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4272900Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4273280Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4273843Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4274296Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4274776Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4275245Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4275785Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4276259Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4276757Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4277220Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4277701Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4278172Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4278607Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4279023Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4279437Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4279897Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4280458Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4280914Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4281401Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4281813Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4282186Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4282661Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4283030Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4283445Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4283909Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4284320Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4284770Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4285272Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4285791Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4286371Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4286918Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4287307Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4287791Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4288169Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4288654Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4289107Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4289549Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4290153Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4290763Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4291108Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4292904Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4293402Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4294333Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4294880Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4295635Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4296226Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4297081Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4297759Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4298278Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4299318Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4299632Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4300397Z E1204 10:25:27.307000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4300517Z ('RERUN', {'yellow': True}) [0.3110s] [ 0%] 2025-12-04T10:35:20.4301731Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4302675Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4303033Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4303409Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4303890Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4304276Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4304772Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4305234Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4305781Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4306272Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4306784Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4307163Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4307605Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4308404Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4308790Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4309161Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4309710Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4310154Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4310622Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4311215Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4311708Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4312171Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4312668Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4313130Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4313612Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4314072Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4314500Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4314911Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4315318Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4315782Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4316285Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4316831Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4317317Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4317724Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4318088Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4318562Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4318929Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4319334Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4319794Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4320195Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4320629Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4321123Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4321611Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4322156Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4322605Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4322992Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4323474Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4323844Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4324336Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4324786Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4325222Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4325824Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4326431Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4326780Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4328567Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4329073Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4330003Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4330551Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4331314Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4331904Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4332654Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4333325Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4333849Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4334824Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4335140Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4335957Z E1204 10:25:27.619000 85104 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4336056Z FAILED [0.3111s] [ 0%] 2025-12-04T10:35:20.4336061Z 2025-12-04T10:35:20.4336181Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.4336523Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4336628Z Traceback (most recent call last): 2025-12-04T10:35:20.4336993Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4337206Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4337620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4337835Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4338284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4338495Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4338945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4339167Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4339623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4339914Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4340358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4340485Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4340890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4340993Z return self._compile_to_module() 2025-12-04T10:35:20.4341453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4341593Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4342035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4342152Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4342573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4342775Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4343269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4343373Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4343811Z File "/tmp/tmpb0rkkcyh/rv/crv5h2l66ynzs6ygycxuobosay3yagqb4q7fes2zv3g3gw3phsof.py", line 74, in 2025-12-04T10:35:20.4344202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4344301Z kernel.precompile( 2025-12-04T10:35:20.4344772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4344872Z self._precompile_worker() 2025-12-04T10:35:20.4345434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4345584Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4346090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4346267Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4346652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4346864Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4347239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4347525Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4347734Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4348289Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4348372Z ^ 2025-12-04T10:35:20.4348768Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4348815Z 2025-12-04T10:35:20.4349425Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4349430Z 2025-12-04T10:35:20.4349481Z 2025-12-04T10:35:20.4349666Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4350402Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4350409Z 2025-12-04T10:35:20.4350642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4350825Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4350910Z frames [('total', 1)] 2025-12-04T10:35:20.4351016Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4351457Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4351658Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4351740Z graph_break [] 2025-12-04T10:35:20.4352074Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4352185Z Traceback (most recent call last): 2025-12-04T10:35:20.4352545Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4352743Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4353162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4353375Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4353815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4353980Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4354415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4354545Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4354999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4355323Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4355772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4355893Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4356312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4356411Z return self._compile_to_module() 2025-12-04T10:35:20.4356821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4356964Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4357403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4357514Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4357931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4358129Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4358635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4358742Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4359266Z File "/tmp/tmp1lbqdv8m/dz/cdz4uz74f7wzgmyudsgimgwnztre32ctqknvswv2d6xloqesd2bh.py", line 74, in 2025-12-04T10:35:20.4359658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4359789Z kernel.precompile( 2025-12-04T10:35:20.4360274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4360372Z self._precompile_worker() 2025-12-04T10:35:20.4360885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4361044Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4361552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4361727Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4362160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4362369Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4362758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4363045Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4363252Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4363809Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4363882Z ^ 2025-12-04T10:35:20.4364282Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4364289Z 2025-12-04T10:35:20.4364902Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4364906Z 2025-12-04T10:35:20.4364912Z 2025-12-04T10:35:20.4365103Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4365884Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4365933Z 2025-12-04T10:35:20.4366170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4366350Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4366436Z frames [('total', 1)] 2025-12-04T10:35:20.4366538Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4366946Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4367134Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4367223Z graph_break [] 2025-12-04T10:35:20.4367406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4367503Z frames [('total', 1)] 2025-12-04T10:35:20.4367603Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4367787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4368194Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4368276Z graph_break [] 2025-12-04T10:35:20.4368399Z =================================== FAILURES =================================== 2025-12-04T10:35:20.4368734Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4368836Z Traceback (most recent call last): 2025-12-04T10:35:20.4369241Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4369446Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4369900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4370117Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4370557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4370718Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4371161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4371286Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4371786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4372064Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4372513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4372645Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4373055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4373164Z return self._compile_to_module() 2025-12-04T10:35:20.4373576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4373719Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4374177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4374290Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4374723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4374928Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4375433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4375618Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4376077Z File "/tmp/tmpnrq81prz/26/c26wb3xem57peeajq4chhxkigcxnyz6uo2d2zp6fmb6yl4ynck5x.py", line 74, in 2025-12-04T10:35:20.4376480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4376577Z kernel.precompile( 2025-12-04T10:35:20.4377052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4377164Z self._precompile_worker() 2025-12-04T10:35:20.4377675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4377827Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4378354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4378525Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4378906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4379166Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4379544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4379878Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4380070Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4380668Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4380746Z ^ 2025-12-04T10:35:20.4381139Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4381144Z 2025-12-04T10:35:20.4381755Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4381759Z 2025-12-04T10:35:20.4381763Z 2025-12-04T10:35:20.4381945Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4382723Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4382735Z 2025-12-04T10:35:20.4382962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4383141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4383228Z frames [('total', 1)] 2025-12-04T10:35:20.4383323Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4383724Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4383922Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4383999Z graph_break [] 2025-12-04T10:35:20.4384185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4384267Z frames [('total', 1)] 2025-12-04T10:35:20.4384362Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4384552Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4384945Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4385026Z graph_break [] 2025-12-04T10:35:20.4385213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4385297Z frames [('total', 1)] 2025-12-04T10:35:20.4385393Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4385628Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4386027Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4386110Z graph_break [] 2025-12-04T10:35:20.4386671Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.xml - 2025-12-04T10:35:20.4386817Z =========================== short test summary info ============================ 2025-12-04T10:35:20.4387528Z FAILED [0.3111s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4388078Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4388148Z ^ 2025-12-04T10:35:20.4388537Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4388542Z 2025-12-04T10:35:20.4389147Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4389156Z 2025-12-04T10:35:20.4389202Z 2025-12-04T10:35:20.4389384Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4390112Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4390159Z 2025-12-04T10:35:20.4390386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4390533Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.4390708Z ================== 1 failed, 32 deselected, 2 rerun in 2.40s =================== 2025-12-04T10:35:20.4390787Z Got exit code 1 2025-12-04T10:35:20.4390873Z Retrying single test... 2025-12-04T10:35:20.4391276Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.xml 2025-12-04T10:35:20.4391408Z ============================= test session starts ============================== 2025-12-04T10:35:20.4391742Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.4391836Z cachedir: .pytest_cache 2025-12-04T10:35:20.4392286Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.4392393Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.4392481Z configfile: pytest.ini 2025-12-04T10:35:20.4392939Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.4393126Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.4393784Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4393880Z Running 1 items in this shard 2025-12-04T10:35:20.4393894Z 2025-12-04T10:35:20.4395115Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4396175Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4396541Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4396908Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4397350Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4397736Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4398188Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4398653Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4399146Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4399643Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4400113Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4400531Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4400967Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4401403Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4401875Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4402246Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4402789Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4403275Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4403735Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4404165Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4404655Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4405110Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4405601Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4406101Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4406583Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4407036Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4407509Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4408143Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4408549Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4408966Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4409464Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4409923Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4410408Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4410814Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4411176Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4411590Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4412037Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4412438Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4412942Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4413343Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4413766Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4414260Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4414796Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4415336Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4415792Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4416166Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4416653Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4417023Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4417511Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4417963Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4418395Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4419087Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4419684Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4419984Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4421769Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4422231Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4423114Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4423692Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4424442Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4425056Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4425814Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4426507Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4427032Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4427963Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4428271Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4429027Z E1204 10:25:37.684000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4429134Z ('RERUN', {'yellow': True}) [1.7760s] [100%] 2025-12-04T10:35:20.4430358Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4431321Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4431684Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4432052Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4432495Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4432881Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4433333Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4433792Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4434280Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4434776Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4435243Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4435677Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4436141Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4436575Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4436965Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4437337Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4437878Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4438404Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4438872Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4439307Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4439801Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4440252Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4440738Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4441195Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4441673Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4442125Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4442599Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4443005Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4443401Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4443810Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4444305Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4444758Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4445244Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4445648Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4446060Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4446469Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4446882Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4447283Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4447766Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4448173Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4448596Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4449091Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4449614Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4450151Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4450552Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4450923Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4451411Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4451774Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4452255Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4452706Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4453136Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4453774Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4454366Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4454668Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4456453Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4456907Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4457790Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4458366Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4459175Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4459790Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4460537Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4461190Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4461839Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4462766Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4463069Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4463828Z E1204 10:25:38.027000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4463936Z ('RERUN', {'yellow': True}) [0.3100s] [100%] 2025-12-04T10:35:20.4465159Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4466125Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4466487Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4466850Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4467284Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4467671Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4468119Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4468576Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4469067Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4469558Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4470024Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4470434Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4470872Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4471307Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4471696Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4472069Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4472611Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4473104Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4473564Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4473996Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4474483Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4474932Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4475423Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4475876Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4476357Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4476810Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4477284Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4477690Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4478089Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4478498Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4478994Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4479455Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4479937Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4480341Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4480709Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4481118Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4481536Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4481936Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4482443Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4482850Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4483273Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4483768Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4484294Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4484832Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4485246Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4485635Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4486150Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4486515Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4486997Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4487450Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4487882Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4488518Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4489114Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4489418Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4491205Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4491668Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4492550Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4493125Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4493886Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4494500Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4495252Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4495902Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4496459Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4497390Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4497703Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4498458Z E1204 10:25:38.338000 85285 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4498542Z FAILED [0.3099s] [100%] 2025-12-04T10:35:20.4498546Z 2025-12-04T10:35:20.4498670Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.4498995Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4499168Z Traceback (most recent call last): 2025-12-04T10:35:20.4499534Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4499729Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4500184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4500393Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4500827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4500987Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4501422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4501544Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4501994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4502265Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4502712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4502831Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4503239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4503337Z return self._compile_to_module() 2025-12-04T10:35:20.4503744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4504517Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4504958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4505104Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4505549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4505772Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4506273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4506375Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4506809Z File "/tmp/tmpur3ppmli/7m/c7m4dwiqluuqqmgfxwny7dlzxpghk3ymbq5zmaxutbg7xqmtwnwg.py", line 74, in 2025-12-04T10:35:20.4507247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4507342Z kernel.precompile( 2025-12-04T10:35:20.4508008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4508109Z self._precompile_worker() 2025-12-04T10:35:20.4508611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4508769Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4509275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4509439Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4509818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4510025Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4510398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4510680Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4510869Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4511496Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4511566Z ^ 2025-12-04T10:35:20.4511959Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4511964Z 2025-12-04T10:35:20.4512566Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4512576Z 2025-12-04T10:35:20.4512580Z 2025-12-04T10:35:20.4512760Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4513505Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4513512Z 2025-12-04T10:35:20.4513744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4513931Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4514015Z frames [('total', 1)] 2025-12-04T10:35:20.4514109Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4514515Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4514698Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4514842Z graph_break [] 2025-12-04T10:35:20.4515167Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4515266Z Traceback (most recent call last): 2025-12-04T10:35:20.4515730Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4515924Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4516336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4516552Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4516989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4517152Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4517634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4517755Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4518209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4518479Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4518921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4519042Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4519446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4519547Z return self._compile_to_module() 2025-12-04T10:35:20.4519954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4520094Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4520533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4520639Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4521062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4521302Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4521798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4521901Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4522336Z File "/tmp/tmpvmp1fq0e/7k/c7kiahdzmh42zojv5a6ezlmsnxgpj7trxyfdp4bpkzqeng536ymo.py", line 74, in 2025-12-04T10:35:20.4522736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4522828Z kernel.precompile( 2025-12-04T10:35:20.4523299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4523403Z self._precompile_worker() 2025-12-04T10:35:20.4523905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4524052Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4524561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4524727Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4525110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4525382Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4525796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4526128Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4526319Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4526877Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4526944Z ^ 2025-12-04T10:35:20.4527332Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4527337Z 2025-12-04T10:35:20.4527942Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4527991Z 2025-12-04T10:35:20.4527996Z 2025-12-04T10:35:20.4528174Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4528914Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4528921Z 2025-12-04T10:35:20.4529144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4529333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4529415Z frames [('total', 1)] 2025-12-04T10:35:20.4529508Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4529912Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4530095Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4530172Z graph_break [] 2025-12-04T10:35:20.4530355Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4530434Z frames [('total', 1)] 2025-12-04T10:35:20.4530528Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4530710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4531103Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4531184Z graph_break [] 2025-12-04T10:35:20.4531346Z =================================== FAILURES =================================== 2025-12-04T10:35:20.4531668Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4531772Z Traceback (most recent call last): 2025-12-04T10:35:20.4532128Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4532329Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4532741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4532948Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4533384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4533542Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4533975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4534095Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4534549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4534821Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4535311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4535468Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4535879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4535974Z return self._compile_to_module() 2025-12-04T10:35:20.4536389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4536528Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4536964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4537068Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4537532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4537730Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4542302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4542433Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4542871Z File "/tmp/tmpx7f8yy_3/hr/chr7fxwmlbid4fzq5dnbajg3fajamjdl26soiv2pwhkegvsitn6q.py", line 74, in 2025-12-04T10:35:20.4543274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4543369Z kernel.precompile( 2025-12-04T10:35:20.4543854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4543958Z self._precompile_worker() 2025-12-04T10:35:20.4544475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4544636Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4545143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4545316Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4545766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4545975Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4546360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4546647Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4546858Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4547414Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4547490Z ^ 2025-12-04T10:35:20.4547897Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4547903Z 2025-12-04T10:35:20.4548514Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4548519Z 2025-12-04T10:35:20.4548523Z 2025-12-04T10:35:20.4548712Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4549447Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4549498Z 2025-12-04T10:35:20.4549732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4549916Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4550043Z frames [('total', 1)] 2025-12-04T10:35:20.4550148Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4550547Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4550736Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4550829Z graph_break [] 2025-12-04T10:35:20.4551006Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4551093Z frames [('total', 1)] 2025-12-04T10:35:20.4551197Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4551383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4551829Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4551914Z graph_break [] 2025-12-04T10:35:20.4552092Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4552182Z frames [('total', 1)] 2025-12-04T10:35:20.4552281Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4552468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4552872Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4552950Z graph_break [] 2025-12-04T10:35:20.4553513Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.xml - 2025-12-04T10:35:20.4553654Z =========================== short test summary info ============================ 2025-12-04T10:35:20.4554372Z FAILED [0.3099s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4554935Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4555009Z ^ 2025-12-04T10:35:20.4555402Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4555454Z 2025-12-04T10:35:20.4556105Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4556110Z 2025-12-04T10:35:20.4556114Z 2025-12-04T10:35:20.4556299Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4557029Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4557037Z 2025-12-04T10:35:20.4557263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4557423Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.4557589Z ================== 1 failed, 187 deselected, 2 rerun in 2.43s ================== 2025-12-04T10:35:20.4557670Z Got exit code 1 2025-12-04T10:35:20.4557772Z Retrying single test... 2025-12-04T10:35:20.4558175Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.xml 2025-12-04T10:35:20.4558319Z ============================= test session starts ============================== 2025-12-04T10:35:20.4558615Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.4558751Z cachedir: .pytest_cache 2025-12-04T10:35:20.4559205Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.4559310Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.4559441Z configfile: pytest.ini 2025-12-04T10:35:20.4559912Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.4560096Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.4560772Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4560868Z Running 1 items in this shard 2025-12-04T10:35:20.4560872Z 2025-12-04T10:35:20.4562139Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4563075Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4563439Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4563809Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4564240Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4564639Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4565093Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4565550Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4566105Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4566666Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4567144Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4567509Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4567957Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4568360Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4568745Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4569124Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4569671Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4570131Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4570639Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4571066Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4571608Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4572060Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4572552Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4573001Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4573520Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4573974Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4574405Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4574825Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4575230Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4575637Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4576192Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4576648Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4577144Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4577588Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4577955Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4578375Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4578740Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4579212Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4579661Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4580065Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4580499Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4580999Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4581581Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4582220Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4582636Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4583059Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4583546Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4583929Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4584412Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4584910Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4585343Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4585938Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4586549Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4586852Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4588648Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4589149Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4590055Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4590597Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4591377Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4591957Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4592708Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4593372Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4593898Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4594881Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4595328Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4596104Z E1204 10:25:48.354000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4596218Z ('RERUN', {'yellow': True}) [1.7605s] [100%] 2025-12-04T10:35:20.4597486Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4598428Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4598795Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4599182Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4599616Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4600021Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4600478Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4600933Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4601428Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4601961Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4602442Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4602808Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4603249Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4603650Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4604035Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4604419Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4604964Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4605407Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4605968Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4606393Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4606925Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4607375Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4608135Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4608588Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4609176Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4609642Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4610071Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4610493Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4610896Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4611298Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4611798Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4612253Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4612742Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4613203Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4613568Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4613982Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4614348Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4614763Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4615213Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4615619Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4616056Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4616552Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4617043Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4617646Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4618048Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4618480Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4618964Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4619383Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4619870Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4620366Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4620804Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4621404Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4622010Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4622316Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4624107Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4624604Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4625505Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4626038Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4626808Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4627386Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4628140Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4628808Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4629324Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4630303Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4630647Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4631413Z E1204 10:25:48.695000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4631525Z ('RERUN', {'yellow': True}) [0.3086s] [100%] 2025-12-04T10:35:20.4632794Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T10:35:20.4633728Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4634088Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4634467Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T10:35:20.4634903Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T10:35:20.4635306Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4635790Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4636270Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4636770Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4637305Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4637783Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4638153Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4638591Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4639001Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4639387Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4639765Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T10:35:20.4640313Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4640757Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.4641268Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T10:35:20.4641692Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4642308Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4642760Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T10:35:20.4643250Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4643710Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T10:35:20.4644227Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4644681Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T10:35:20.4645114Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.4645533Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T10:35:20.4645976Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T10:35:20.4646390Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T10:35:20.4646892Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4647346Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T10:35:20.4647843Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4648241Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T10:35:20.4648645Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T10:35:20.4649067Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T10:35:20.4649433Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T10:35:20.4649850Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T10:35:20.4650296Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T10:35:20.4650698Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T10:35:20.4651135Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T10:35:20.4651626Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4652117Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T10:35:20.4652700Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4653102Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T10:35:20.4653546Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T10:35:20.4654032Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T10:35:20.4654409Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T10:35:20.4654891Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T10:35:20.4655384Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T10:35:20.4655823Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T10:35:20.4656419Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T10:35:20.4657018Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T10:35:20.4657315Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4659156Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4659615Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4660551Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4661085Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4661844Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4662431Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4663181Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4663837Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4664358Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4665330Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4665701Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4666489Z E1204 10:25:49.006000 85466 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4666578Z FAILED [0.3090s] [100%] 2025-12-04T10:35:20.4666583Z 2025-12-04T10:35:20.4666704Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.4667038Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4667186Z Traceback (most recent call last): 2025-12-04T10:35:20.4667545Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4667756Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4668170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4668389Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4668827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4668986Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4669423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4669548Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4670016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4670292Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4670737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4670868Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4671322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4671424Z return self._compile_to_module() 2025-12-04T10:35:20.4671848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4671988Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4672444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4672550Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4672974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4673184Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4673686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4673804Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4674231Z File "/tmp/tmp2inw7ps3/7n/c7nwg5rj27h3h5u7hcqs2e6kxmi2hnt4w6cfhufcsvbc4eixm7wx.py", line 74, in 2025-12-04T10:35:20.4674621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4674723Z kernel.precompile( 2025-12-04T10:35:20.4675246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4675347Z self._precompile_worker() 2025-12-04T10:35:20.4675901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4676106Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4676631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4676795Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4677176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4677393Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4677805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4678110Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4678307Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4678862Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4678944Z ^ 2025-12-04T10:35:20.4679354Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4679358Z 2025-12-04T10:35:20.4679980Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4679984Z 2025-12-04T10:35:20.4679988Z 2025-12-04T10:35:20.4680176Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4680920Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4680936Z 2025-12-04T10:35:20.4681161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4681350Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4681451Z frames [('total', 1)] 2025-12-04T10:35:20.4681599Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4682016Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4682220Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4682304Z graph_break [] 2025-12-04T10:35:20.4682635Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4682758Z Traceback (most recent call last): 2025-12-04T10:35:20.4683120Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4683330Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4683743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4683962Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4684412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4684577Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4685020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4685142Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4685668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4685982Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4686473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4686605Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4687013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4687112Z return self._compile_to_module() 2025-12-04T10:35:20.4687537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4687674Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4688155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4688276Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4688772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4688988Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4689495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4689599Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4690047Z File "/tmp/tmpjfoamug5/rn/crnti6lnauzipbt65gg7d4qqts3r65qrrpudfnr4ju6pexgwkqoc.py", line 74, in 2025-12-04T10:35:20.4690443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4690542Z kernel.precompile( 2025-12-04T10:35:20.4691023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4691121Z self._precompile_worker() 2025-12-04T10:35:20.4691649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4691802Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4692352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4692529Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4692913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4693136Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4693526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4693812Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4694011Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4694573Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4694655Z ^ 2025-12-04T10:35:20.4695053Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4695057Z 2025-12-04T10:35:20.4695680Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4695687Z 2025-12-04T10:35:20.4695692Z 2025-12-04T10:35:20.4695918Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4696733Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4696776Z 2025-12-04T10:35:20.4697009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4697188Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4697269Z frames [('total', 1)] 2025-12-04T10:35:20.4697369Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4697766Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4697969Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4698044Z graph_break [] 2025-12-04T10:35:20.4698221Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4698312Z frames [('total', 1)] 2025-12-04T10:35:20.4698448Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4698628Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4699076Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4699154Z graph_break [] 2025-12-04T10:35:20.4699279Z =================================== FAILURES =================================== 2025-12-04T10:35:20.4699603Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T10:35:20.4699701Z Traceback (most recent call last): 2025-12-04T10:35:20.4700065Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4700256Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4700667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4700888Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4701321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4701495Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4701928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4702093Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4702560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4702826Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4703277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4703402Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4703804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4703914Z return self._compile_to_module() 2025-12-04T10:35:20.4704320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4704456Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4704906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4705012Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4705436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4705653Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4706218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4706325Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4706793Z File "/tmp/tmp4m9g6blk/fs/cfsxzog75va7fmvrop2h6illmb2t262bbgyjddo4lx2jolzeoqvu.py", line 74, in 2025-12-04T10:35:20.4707190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4707281Z kernel.precompile( 2025-12-04T10:35:20.4707962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4708066Z self._precompile_worker() 2025-12-04T10:35:20.4708573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4708725Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4709304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4709470Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4709854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4710055Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4710427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4710727Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4710917Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4711481Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4711553Z ^ 2025-12-04T10:35:20.4711942Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4711949Z 2025-12-04T10:35:20.4712567Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4712572Z 2025-12-04T10:35:20.4712576Z 2025-12-04T10:35:20.4712812Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4713556Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4713561Z 2025-12-04T10:35:20.4713784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4713971Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4714068Z frames [('total', 1)] 2025-12-04T10:35:20.4714165Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4714571Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4714756Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4714831Z graph_break [] 2025-12-04T10:35:20.4715012Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4715097Z frames [('total', 1)] 2025-12-04T10:35:20.4715187Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4715378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4715771Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4715918Z graph_break [] 2025-12-04T10:35:20.4716096Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4716177Z frames [('total', 1)] 2025-12-04T10:35:20.4716269Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4716505Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4716897Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.4716977Z graph_break [] 2025-12-04T10:35:20.4717540Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.xml - 2025-12-04T10:35:20.4717687Z =========================== short test summary info ============================ 2025-12-04T10:35:20.4718392Z FAILED [0.3090s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4718992Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4719069Z ^ 2025-12-04T10:35:20.4719454Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4719459Z 2025-12-04T10:35:20.4720068Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4720072Z 2025-12-04T10:35:20.4720076Z 2025-12-04T10:35:20.4720254Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4720987Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4720995Z 2025-12-04T10:35:20.4721216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4721360Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.4721543Z ================== 1 failed, 187 deselected, 2 rerun in 2.41s ================== 2025-12-04T10:35:20.4721625Z Got exit code 1 2025-12-04T10:35:20.4722150Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T10:35:20.4722555Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.4722955Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.xml 2025-12-04T10:35:20.4723104Z ============================= test session starts ============================== 2025-12-04T10:35:20.4723407Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.4723504Z cachedir: .pytest_cache 2025-12-04T10:35:20.4723950Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.4724053Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.4724148Z configfile: pytest.ini 2025-12-04T10:35:20.4724604Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.4724795Z collecting ... collected 188 items / 33 deselected / 155 selected 2025-12-04T10:35:20.4724914Z stepcurrent: skipping 33 already run items. 2025-12-04T10:35:20.4725011Z Running 155 items in this shard 2025-12-04T10:35:20.4725016Z 2025-12-04T10:35:20.4726297Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.4727322Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4727727Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4728116Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.4728550Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.4728941Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4729427Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4729881Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4730378Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4730870Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4731343Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4731718Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4732158Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4732566Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4732956Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4733402Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.4733806Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.4734360Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4734946Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4735525Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4736024Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.4736489Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.4736919Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4737311Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.4737718Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.4738125Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.4738527Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.4738919Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.4739404Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.4739795Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.4740234Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.4740773Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4741283Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.4741822Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4742224Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.4742601Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.4743080Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.4743455Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.4743932Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.4744384Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.4744863Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.4745461Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.4746066Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.4746367Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4748403Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4748856Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4749789Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4750364Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4751127Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4751702Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4752488Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4753143Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4753656Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4754647Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4754949Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4755742Z E1204 10:25:59.242000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4755864Z ('RERUN', {'yellow': True}) [1.9602s] [ 0%] 2025-12-04T10:35:20.4757136Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.4758117Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4758477Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4758847Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.4759288Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.4759687Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4760135Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4760590Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4761084Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4761615Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4762135Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4762516Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4762952Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4763353Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4763732Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4764146Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.4764549Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.4765094Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4765676Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4766298Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4766746Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.4767212Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.4767644Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4768031Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.4768432Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.4768837Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.4769200Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.4769592Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.4770025Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.4770419Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.4770848Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.4771344Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4771828Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.4772363Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4772806Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.4773224Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.4773706Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.4774073Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.4774549Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.4775057Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.4775494Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.4776139Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.4776739Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.4777036Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4779148Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4779649Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4780535Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4781064Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4781821Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4782394Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4783138Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4783790Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4784348Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4785325Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4785700Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4786483Z E1204 10:25:59.769000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4786591Z ('RERUN', {'yellow': True}) [0.4948s] [ 0%] 2025-12-04T10:35:20.4787840Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.4788819Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4789181Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4789556Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.4789988Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.4790375Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4790820Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4791271Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4791802Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4792290Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4792759Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4793133Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4793567Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4793966Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4794348Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4794719Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.4795118Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.4795676Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4796332Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4796950Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4797400Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.4797860Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.4798283Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4798719Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.4799078Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.4799482Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.4799843Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.4800229Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.4800668Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.4801061Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.4801490Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.4801981Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4802469Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.4803045Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4803456Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.4803831Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.4804313Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.4804680Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.4805157Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.4805609Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.4806042Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.4806631Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.4807273Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.4807570Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4809875Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4810399Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4811289Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4811820Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4812571Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4813146Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4813894Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4814548Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4815128Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4816157Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4816462Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4817218Z E1204 10:26:00.264000 85647 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4817303Z FAILED [0.4934s] [ 0%] 2025-12-04T10:35:20.4817308Z 2025-12-04T10:35:20.4817424Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.4817758Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.4817864Z Traceback (most recent call last): 2025-12-04T10:35:20.4818219Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4818416Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4818977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4819230Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4819742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4819901Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4820336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4820453Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4820909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4821178Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4821663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4821789Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4822192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4822295Z return self._compile_to_module() 2025-12-04T10:35:20.4822703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4822837Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4823279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4823381Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4823795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4823993Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4824488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4824594Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4825029Z File "/tmp/tmpypvs7ij7/xe/cxeifugu7yk62ihad5gfdz54t3j7qrhu3prwjgfdqr7lhebb5lua.py", line 137, in 2025-12-04T10:35:20.4825469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4825578Z kernel.precompile( 2025-12-04T10:35:20.4826074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4826173Z self._precompile_worker() 2025-12-04T10:35:20.4826682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4826833Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4827341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4827508Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4827884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4828093Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4828464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4828749Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4828937Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4829538Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4829656Z ^ 2025-12-04T10:35:20.4830044Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4830091Z 2025-12-04T10:35:20.4830703Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4830710Z 2025-12-04T10:35:20.4830714Z 2025-12-04T10:35:20.4830892Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4831625Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.4831633Z 2025-12-04T10:35:20.4831897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4832085Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4832170Z frames [('total', 1)] 2025-12-04T10:35:20.4832264Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4832662Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.4832846Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4832925Z graph_break [] 2025-12-04T10:35:20.4833255Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.4833361Z Traceback (most recent call last): 2025-12-04T10:35:20.4833713Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4833910Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4834323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4834527Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4834963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4835121Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4835610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4835740Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4836209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4836481Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4836923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4837047Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4837450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4837547Z return self._compile_to_module() 2025-12-04T10:35:20.4837954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4838091Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4838523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4838632Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4839047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4839287Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4839780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4839922Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4840357Z File "/tmp/tmp832i0crr/7f/c7fqqsrwyk54jujuqkn57ldog7uskihl3ptocf632lzu2mvbnmtx.py", line 137, in 2025-12-04T10:35:20.4840747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4840835Z kernel.precompile( 2025-12-04T10:35:20.4841303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4841393Z self._precompile_worker() 2025-12-04T10:35:20.4841898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4842087Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4842593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4842760Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4843137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4843342Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4843710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4843989Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4848229Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4848864Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4848947Z ^ 2025-12-04T10:35:20.4849353Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4849362Z 2025-12-04T10:35:20.4850038Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4850044Z 2025-12-04T10:35:20.4850048Z 2025-12-04T10:35:20.4850243Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4850981Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.4850986Z 2025-12-04T10:35:20.4851229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4851417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4851505Z frames [('total', 1)] 2025-12-04T10:35:20.4851624Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4852029Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.4852224Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4852308Z graph_break [] 2025-12-04T10:35:20.4852491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4852582Z frames [('total', 1)] 2025-12-04T10:35:20.4852681Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4852866Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4853264Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.4853392Z graph_break [] 2025-12-04T10:35:20.4853519Z =================================== FAILURES =================================== 2025-12-04T10:35:20.4853891Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.4853994Z Traceback (most recent call last): 2025-12-04T10:35:20.4854361Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4854567Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4855013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4855240Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4855748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4855981Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4856414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4856536Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4856998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4857273Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4857731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4857858Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4858262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4858371Z return self._compile_to_module() 2025-12-04T10:35:20.4858780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4858918Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4859429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4859539Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4860016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4860216Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4860715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4860833Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4861280Z File "/tmp/tmpkqiiv75c/t3/ct3mptplrgt2uvirncb4gzuqek6pqzinr2tvwlmepfb3pevb7mwa.py", line 137, in 2025-12-04T10:35:20.4861681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4861772Z kernel.precompile( 2025-12-04T10:35:20.4862247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4862348Z self._precompile_worker() 2025-12-04T10:35:20.4862854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4863002Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4863511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4863679Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4864142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4864352Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4864763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4865053Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4865246Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4865856Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4865929Z ^ 2025-12-04T10:35:20.4866323Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4866330Z 2025-12-04T10:35:20.4866975Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4866983Z 2025-12-04T10:35:20.4866987Z 2025-12-04T10:35:20.4867168Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4867912Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.4867917Z 2025-12-04T10:35:20.4868140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4868320Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4868408Z frames [('total', 1)] 2025-12-04T10:35:20.4868509Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4868911Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.4869105Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4869187Z graph_break [] 2025-12-04T10:35:20.4869369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4869455Z frames [('total', 1)] 2025-12-04T10:35:20.4869549Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4869735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4870173Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.4870262Z graph_break [] 2025-12-04T10:35:20.4870438Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4870525Z frames [('total', 1)] 2025-12-04T10:35:20.4870627Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4870814Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4871203Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.4871295Z graph_break [] 2025-12-04T10:35:20.4871850Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.xml - 2025-12-04T10:35:20.4871998Z =========================== short test summary info ============================ 2025-12-04T10:35:20.4872714Z FAILED [0.4934s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4873315Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4873436Z ^ 2025-12-04T10:35:20.4873830Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4873834Z 2025-12-04T10:35:20.4874447Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4874493Z 2025-12-04T10:35:20.4874496Z 2025-12-04T10:35:20.4874679Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4875420Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.4875425Z 2025-12-04T10:35:20.4875655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4875832Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.4876077Z ================== 1 failed, 33 deselected, 2 rerun in 2.98s =================== 2025-12-04T10:35:20.4876158Z Got exit code 1 2025-12-04T10:35:20.4876247Z Retrying single test... 2025-12-04T10:35:20.4876661Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.xml 2025-12-04T10:35:20.4876798Z ============================= test session starts ============================== 2025-12-04T10:35:20.4877101Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.4877199Z cachedir: .pytest_cache 2025-12-04T10:35:20.4877644Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.4877753Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.4877846Z configfile: pytest.ini 2025-12-04T10:35:20.4878311Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.4878508Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.4879170Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.4879276Z Running 1 items in this shard 2025-12-04T10:35:20.4879281Z 2025-12-04T10:35:20.4880549Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.4881535Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4881898Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4882272Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.4882714Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.4883100Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4883556Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4884010Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4884547Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4885042Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4885555Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4885960Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4886417Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4886817Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4887240Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4887614Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.4888024Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.4888569Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4889161Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4889859Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4890320Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.4890787Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.4891218Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4891672Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.4892034Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.4892435Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.4892802Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.4893193Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.4893632Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.4894030Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.4894455Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.4894956Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4895441Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.4896032Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4896434Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.4896853Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.4897338Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.4897702Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.4898186Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.4898681Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.4899166Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.4899759Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.4900356Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.4900662Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4902695Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4903228Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4904209Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4904755Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4905535Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4906144Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4906893Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4907551Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4908368Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4909426Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4909745Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4910503Z E1204 10:26:09.806000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4910621Z ('RERUN', {'yellow': True}) [1.9511s] [100%] 2025-12-04T10:35:20.4911895Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.4912880Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4913239Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4913610Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.4914048Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.4914439Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4914898Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4915358Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4915961Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4916457Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4916924Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4917302Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4917737Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4918134Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4918523Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4918894Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.4919301Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.4919842Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4920491Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4921117Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4921569Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.4922040Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.4922467Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4922907Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.4923268Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.4923670Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.4924040Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.4924428Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.4924869Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.4925262Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.4925693Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.4926199Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4926685Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.4927276Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4927680Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.4928049Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.4928542Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.4928917Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.4929401Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.4929855Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.4930291Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.4930887Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.4931530Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.4931877Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4933949Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4934415Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4935314Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4935907Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4936665Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4937248Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4938002Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4938698Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4939262Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4940239Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4940551Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4941310Z E1204 10:26:10.334000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4941425Z ('RERUN', {'yellow': True}) [0.4958s] [100%] 2025-12-04T10:35:20.4942641Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.4943621Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4944021Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.4944430Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.4944876Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.4945259Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.4945756Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.4946295Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.4946795Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.4947298Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.4947768Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.4948144Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.4948584Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.4948981Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.4949370Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.4949739Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.4950149Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.4950731Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.4951317Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4951901Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.4952347Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.4952818Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.4953244Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.4953645Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.4954004Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.4954400Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.4954818Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.4955201Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.4955691Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.4956087Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.4956515Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.4957018Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.4957544Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.4958086Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.4958492Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.4958862Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.4959345Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.4959713Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.4960199Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.4960648Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.4961078Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.4961713Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.4962311Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.4962618Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.4964648Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.4965108Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.4966047Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4966626Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4967421Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4968005Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4968753Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4969444Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4969968Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.4970946Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4971260Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.4972016Z E1204 10:26:10.826000 85871 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4972109Z FAILED [0.4905s] [100%] 2025-12-04T10:35:20.4972116Z 2025-12-04T10:35:20.4972236Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.4972565Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.4972671Z Traceback (most recent call last): 2025-12-04T10:35:20.4973029Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4973273Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4973696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4973905Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4974345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4974517Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4974952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4975084Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4975538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4975842Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4976312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4976433Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4976844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4976942Z return self._compile_to_module() 2025-12-04T10:35:20.4977403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4977543Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4978019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4978134Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4978558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4978755Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4979330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4979435Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4979925Z File "/tmp/tmpq0aachsr/ej/cejtcis6knpjxir6ekapo422t5j5vdbcejxujwvscyxqahq5ixnz.py", line 137, in 2025-12-04T10:35:20.4980322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4980415Z kernel.precompile( 2025-12-04T10:35:20.4980898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4980995Z self._precompile_worker() 2025-12-04T10:35:20.4981504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4981657Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4982162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4982334Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4982719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4982926Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4983308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4983593Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4983795Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.4984445Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.4984519Z ^ 2025-12-04T10:35:20.4984915Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.4984920Z 2025-12-04T10:35:20.4985528Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.4985533Z 2025-12-04T10:35:20.4985539Z 2025-12-04T10:35:20.4985726Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.4986463Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.4986469Z 2025-12-04T10:35:20.4986702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.4986888Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.4986968Z frames [('total', 1)] 2025-12-04T10:35:20.4987068Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.4987464Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.4987711Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.4987793Z graph_break [] 2025-12-04T10:35:20.4988126Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.4988299Z Traceback (most recent call last): 2025-12-04T10:35:20.4988665Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.4988863Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.4989286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.4989496Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.4989932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.4990140Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.4990574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.4990703Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.4991155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.4991428Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.4991875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.4991996Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.4992409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.4992514Z return self._compile_to_module() 2025-12-04T10:35:20.4992927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.4993076Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.4993528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.4993634Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.4994189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.4994389Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.4994897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.4995000Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.4995445Z File "/tmp/tmp6kxbri77/3b/c3b3hyz5useo4gsaajnsxvx7zudxdpssk6acfun6imgryb5gwjqd.py", line 137, in 2025-12-04T10:35:20.4995913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.4996002Z kernel.precompile( 2025-12-04T10:35:20.4996481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.4996577Z self._precompile_worker() 2025-12-04T10:35:20.4997086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.4997237Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.4997748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.4997915Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.4998426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.4998648Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.4999102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.4999412Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.4999622Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5000281Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5000352Z ^ 2025-12-04T10:35:20.5000769Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5000777Z 2025-12-04T10:35:20.5001463Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5001469Z 2025-12-04T10:35:20.5001476Z 2025-12-04T10:35:20.5001660Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5002397Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5002404Z 2025-12-04T10:35:20.5002625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5002810Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5002890Z frames [('total', 1)] 2025-12-04T10:35:20.5002983Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5003386Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5003571Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5003650Z graph_break [] 2025-12-04T10:35:20.5003825Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5003908Z frames [('total', 1)] 2025-12-04T10:35:20.5004012Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5004192Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5004626Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5004711Z graph_break [] 2025-12-04T10:35:20.5004827Z =================================== FAILURES =================================== 2025-12-04T10:35:20.5005159Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.5005258Z Traceback (most recent call last): 2025-12-04T10:35:20.5005627Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5005855Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5006290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5006502Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5006946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5007109Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5007550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5007665Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5008258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5008605Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5009104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5009238Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5009653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5009755Z return self._compile_to_module() 2025-12-04T10:35:20.5010165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5010296Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5010742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5010905Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5011327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5011522Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5012019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5012129Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5012574Z File "/tmp/tmpapf2407i/3d/c3dmve2uwznbo7hqpizqbwgh5fqjahpm6cco2r7sntvvta5bkn6i.py", line 137, in 2025-12-04T10:35:20.5012966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5013055Z kernel.precompile( 2025-12-04T10:35:20.5013527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5013629Z self._precompile_worker() 2025-12-04T10:35:20.5014137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5014286Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5014790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5015026Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5015416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5015645Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5016043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5016328Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5016525Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5017129Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5017201Z ^ 2025-12-04T10:35:20.5017588Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5017593Z 2025-12-04T10:35:20.5018196Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5018204Z 2025-12-04T10:35:20.5018208Z 2025-12-04T10:35:20.5018390Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5019209Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5019215Z 2025-12-04T10:35:20.5019485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5019661Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5019746Z frames [('total', 1)] 2025-12-04T10:35:20.5019838Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5020233Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5020418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5020496Z graph_break [] 2025-12-04T10:35:20.5020680Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5020774Z frames [('total', 1)] 2025-12-04T10:35:20.5020865Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5021087Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5021478Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5021557Z graph_break [] 2025-12-04T10:35:20.5021732Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5021816Z frames [('total', 1)] 2025-12-04T10:35:20.5021912Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5022093Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5022477Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5022550Z graph_break [] 2025-12-04T10:35:20.5023111Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.xml - 2025-12-04T10:35:20.5023261Z =========================== short test summary info ============================ 2025-12-04T10:35:20.5023972Z FAILED [0.4905s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5024612Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5024690Z ^ 2025-12-04T10:35:20.5025088Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5025093Z 2025-12-04T10:35:20.5025748Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5025759Z 2025-12-04T10:35:20.5025763Z 2025-12-04T10:35:20.5025944Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5026676Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5026683Z 2025-12-04T10:35:20.5026904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5027057Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.5027222Z ================== 1 failed, 187 deselected, 2 rerun in 2.97s ================== 2025-12-04T10:35:20.5027309Z Got exit code 1 2025-12-04T10:35:20.5027393Z Retrying single test... 2025-12-04T10:35:20.5027790Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.xml 2025-12-04T10:35:20.5027927Z ============================= test session starts ============================== 2025-12-04T10:35:20.5028291Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.5028386Z cachedir: .pytest_cache 2025-12-04T10:35:20.5028886Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.5028989Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.5029077Z configfile: pytest.ini 2025-12-04T10:35:20.5029540Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.5029730Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.5030394Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5030486Z Running 1 items in this shard 2025-12-04T10:35:20.5030493Z 2025-12-04T10:35:20.5031758Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.5032742Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5033108Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.5033478Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.5033925Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.5034310Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5034762Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5035261Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5035794Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5036302Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.5036776Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5037142Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.5037581Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5037975Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5038364Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5038734Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.5039140Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.5039728Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.5040311Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.5040943Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.5041391Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.5041861Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.5042329Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5042725Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.5043093Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.5043491Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.5043854Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.5044246Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.5044680Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.5045083Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.5045506Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.5046046Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5046580Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.5047117Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.5047520Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.5047894Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.5048383Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.5048750Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.5049231Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.5049680Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.5050108Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.5050703Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.5051343Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.5051680Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5053762Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5054220Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5055112Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5055640Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5056395Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5056971Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5057725Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5058418Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5058937Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5059972Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5062767Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5063531Z E1204 10:26:20.482000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5063641Z ('RERUN', {'yellow': True}) [1.9787s] [100%] 2025-12-04T10:35:20.5064868Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.5065891Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5066329Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.5066705Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.5067143Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.5067527Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5067981Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5068483Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5068980Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5069472Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.5069941Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5070318Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.5070751Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5071157Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5071541Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5071917Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.5072322Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.5072930Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.5073513Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.5074092Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.5074628Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.5075088Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.5075512Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5075950Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.5076316Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.5076721Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.5077124Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.5077514Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.5077957Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.5078349Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.5078773Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.5079264Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5079794Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.5080335Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.5080743Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.5081118Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.5081598Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.5081960Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.5082451Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.5082895Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.5083334Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.5083964Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.5084569Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.5084869Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5086952Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5087462Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5088349Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5088926Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5089683Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5090262Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5091005Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5091697Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5092216Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5093197Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5093503Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5094263Z E1204 10:26:21.007000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5094378Z ('RERUN', {'yellow': True}) [0.4932s] [100%] 2025-12-04T10:35:20.5095591Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.5096660Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5097017Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.5097390Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T10:35:20.5097827Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T10:35:20.5098259Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5098707Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5099222Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5099720Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5100209Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.5100716Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5101088Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T10:35:20.5101528Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5101925Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5102306Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5102676Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.5103129Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T10:35:20.5103670Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T10:35:20.5104255Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.5104829Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.5105279Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.5105736Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T10:35:20.5106167Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5106563Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.5106919Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T10:35:20.5107361Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.5107723Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.5108264Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.5108708Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.5109102Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.5109608Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.5110101Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5110587Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T10:35:20.5111123Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T10:35:20.5111527Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T10:35:20.5111956Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T10:35:20.5112438Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T10:35:20.5112802Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T10:35:20.5113287Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T10:35:20.5113732Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T10:35:20.5114175Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T10:35:20.5114845Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T10:35:20.5115444Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T10:35:20.5115794Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5117825Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5118288Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5119231Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5119764Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5120517Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5121098Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5121887Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5122538Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5123051Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5124027Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5124379Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5125137Z E1204 10:26:21.503000 86095 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5125225Z FAILED [0.4940s] [100%] 2025-12-04T10:35:20.5125230Z 2025-12-04T10:35:20.5125346Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.5125712Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.5125829Z Traceback (most recent call last): 2025-12-04T10:35:20.5126226Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5126425Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5126838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5127044Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5127479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5127637Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5128071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5128193Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5128643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5128917Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5129357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5129478Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5129883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5130023Z return self._compile_to_module() 2025-12-04T10:35:20.5130434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5130568Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5131009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5131126Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5131541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5131787Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5132280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5132382Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5132813Z File "/tmp/tmplm7i5550/3g/c3gpe46xaiv3dm27odfp43z4bvt5nzjdnwmjy6b2wc4c7yncq5ji.py", line 137, in 2025-12-04T10:35:20.5133201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5133294Z kernel.precompile( 2025-12-04T10:35:20.5133761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5133900Z self._precompile_worker() 2025-12-04T10:35:20.5134412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5134559Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5135060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5135232Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5135608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5135814Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5136181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5136503Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5136699Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5137302Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5137374Z ^ 2025-12-04T10:35:20.5137762Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5137768Z 2025-12-04T10:35:20.5138375Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5138379Z 2025-12-04T10:35:20.5138387Z 2025-12-04T10:35:20.5138563Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5139340Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5139350Z 2025-12-04T10:35:20.5139575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5139754Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5139833Z frames [('total', 1)] 2025-12-04T10:35:20.5139930Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5140373Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5140562Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5140640Z graph_break [] 2025-12-04T10:35:20.5140967Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.5141069Z Traceback (most recent call last): 2025-12-04T10:35:20.5141428Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5141620Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5142083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5142290Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5142727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5142884Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5143311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5143431Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5143880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5144198Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5144641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5144763Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5145177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5145279Z return self._compile_to_module() 2025-12-04T10:35:20.5145714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5145862Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5146306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5146467Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5146882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5147079Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5147578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5147680Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5152308Z File "/tmp/tmp7fafc3o6/uf/cufki3gdnwymicpsh4qp3xw2dso54p4p3y5ilv7dyzckczi3dyxc.py", line 137, in 2025-12-04T10:35:20.5152733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5152827Z kernel.precompile( 2025-12-04T10:35:20.5153314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5153418Z self._precompile_worker() 2025-12-04T10:35:20.5153933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5154087Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5154593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5154833Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5155222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5155437Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5155861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5156155Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5156358Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5157012Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5157082Z ^ 2025-12-04T10:35:20.5157480Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5157485Z 2025-12-04T10:35:20.5158088Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5158093Z 2025-12-04T10:35:20.5158097Z 2025-12-04T10:35:20.5158285Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5159094Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5159101Z 2025-12-04T10:35:20.5159338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5159524Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5159607Z frames [('total', 1)] 2025-12-04T10:35:20.5159713Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5160113Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5160307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5160386Z graph_break [] 2025-12-04T10:35:20.5160572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5160663Z frames [('total', 1)] 2025-12-04T10:35:20.5160765Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5160996Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5161398Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5161477Z graph_break [] 2025-12-04T10:35:20.5161597Z =================================== FAILURES =================================== 2025-12-04T10:35:20.5161937Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T10:35:20.5162038Z Traceback (most recent call last): 2025-12-04T10:35:20.5162405Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5162597Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5163007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5163222Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5163658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5163832Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5164265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5164386Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5164895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5165165Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5165603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5165736Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5166141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5166291Z return self._compile_to_module() 2025-12-04T10:35:20.5166705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5166838Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5167291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5167397Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5167822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5168013Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5168552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5168665Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5169104Z File "/tmp/tmplij4b26i/pe/cpeyjntdaihccxuruy2y24kny7tuxs4v3lxb7wctljlz63lw667t.py", line 137, in 2025-12-04T10:35:20.5169494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5169586Z kernel.precompile( 2025-12-04T10:35:20.5170061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5170161Z self._precompile_worker() 2025-12-04T10:35:20.5170664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5170810Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5171446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5171618Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5171998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5172199Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5172570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5172855Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5173045Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5173651Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5173726Z ^ 2025-12-04T10:35:20.5174117Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5174125Z 2025-12-04T10:35:20.5174734Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5174740Z 2025-12-04T10:35:20.5174744Z 2025-12-04T10:35:20.5174966Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5175790Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5175800Z 2025-12-04T10:35:20.5176083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5176268Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5176359Z frames [('total', 1)] 2025-12-04T10:35:20.5176455Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5176923Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5177108Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5177191Z graph_break [] 2025-12-04T10:35:20.5177377Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5177463Z frames [('total', 1)] 2025-12-04T10:35:20.5177557Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5177746Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5178138Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5178223Z graph_break [] 2025-12-04T10:35:20.5178447Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5178530Z frames [('total', 1)] 2025-12-04T10:35:20.5178624Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5178812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5179283Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.5179366Z graph_break [] 2025-12-04T10:35:20.5179923Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.xml - 2025-12-04T10:35:20.5180070Z =========================== short test summary info ============================ 2025-12-04T10:35:20.5180784Z FAILED [0.4940s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5181433Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.5181514Z ^ 2025-12-04T10:35:20.5181902Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5181907Z 2025-12-04T10:35:20.5182520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5182525Z 2025-12-04T10:35:20.5182529Z 2025-12-04T10:35:20.5182708Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5183441Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5183452Z 2025-12-04T10:35:20.5183679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5183830Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.5184007Z ================== 1 failed, 187 deselected, 2 rerun in 3.00s ================== 2025-12-04T10:35:20.5184089Z Got exit code 1 2025-12-04T10:35:20.5184612Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T10:35:20.5185014Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.5185414Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.xml 2025-12-04T10:35:20.5185578Z ============================= test session starts ============================== 2025-12-04T10:35:20.5185899Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.5185992Z cachedir: .pytest_cache 2025-12-04T10:35:20.5186444Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.5186599Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.5186690Z configfile: pytest.ini 2025-12-04T10:35:20.5187159Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.5187352Z collecting ... collected 188 items / 34 deselected / 154 selected 2025-12-04T10:35:20.5187479Z stepcurrent: skipping 34 already run items. 2025-12-04T10:35:20.5187575Z Running 154 items in this shard 2025-12-04T10:35:20.5187579Z 2025-12-04T10:35:20.5188741Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5189811Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5190181Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5190569Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5190957Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5191421Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5191923Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5192414Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5192841Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5193311Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5193690Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5194047Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5194555Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5195057Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5195570Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5196105Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5196553Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5197012Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5197424Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5197831Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5198304Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5198990Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5199433Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5199925Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5200571Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5201089Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5201424Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5201978Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5202495Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5203103Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5203702Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5204111Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5204517Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5204912Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5205456Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5205946Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5206410Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5206900Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5207393Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5208046Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5208462Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5208869Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5209263Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5210032Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5210485Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5210899Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5211281Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5211708Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5212148Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5212568Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5213021Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5213441Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5213879Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5214377Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5214926Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5215424Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5215846Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5216233Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5216716Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5217103Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5217590Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5218052Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5218578Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5219169Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5219638Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5219937Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5221872Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5222372Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5223264Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5223835Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5224601Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5225174Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5225918Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5226617Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5227135Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5228076Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5228382Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5229154Z E1204 10:26:30.982000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5229272Z ('RERUN', {'yellow': True}) [1.7872s] [ 0%] 2025-12-04T10:35:20.5230414Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5231395Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5231758Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5232141Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5232534Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5232997Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5233494Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5233995Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5234420Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5234889Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5235279Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5235687Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5236237Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5236736Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5237247Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5237740Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5238223Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5238676Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5239091Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5239491Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5239892Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5240581Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5241037Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5241531Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5242142Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5242724Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5243064Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5243614Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5244140Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5244757Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5245354Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5245755Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5246210Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5246606Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5247188Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5247633Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5248095Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5248586Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5249031Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5249526Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5249939Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5250346Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5250742Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5251423Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5251875Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5252293Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5252681Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5253110Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5253491Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5253957Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5254410Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5254831Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5255276Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5255831Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5256322Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5256817Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5257243Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5257628Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5258157Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5258549Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5259109Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5259574Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5260101Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5260593Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5261110Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5261415Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5263349Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5263805Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5264709Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5265279Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5266098Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5266671Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5267429Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5268118Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5268639Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5269575Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5269886Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5270697Z E1204 10:26:31.351000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5270808Z ('RERUN', {'yellow': True}) [0.3363s] [ 0%] 2025-12-04T10:35:20.5271960Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5272879Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5273281Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5273676Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5274074Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5274544Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5275011Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5275507Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5275990Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5276462Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5276852Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5277211Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5277754Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5278257Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5278773Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5279268Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5279762Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5280225Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5280642Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5281050Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5281463Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5282195Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5282664Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5283165Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5283780Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5284299Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5284702Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5285261Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5285789Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5286368Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5286964Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5287372Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5287790Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5288187Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5288739Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5289225Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5289699Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5290203Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5290736Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5291242Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5291659Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5292078Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5292477Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5293161Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5293657Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5294083Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5294492Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5294917Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5295308Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5295773Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5296282Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5296715Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5297162Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5297667Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5298163Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5298663Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5299159Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5299554Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5300040Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5300481Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5300964Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5301431Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5301962Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5302493Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5302961Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5303262Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5305204Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5305714Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5306619Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5307155Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5308122Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5308708Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5309461Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5310118Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5310639Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5311583Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5311902Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5312732Z E1204 10:26:31.689000 86319 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5312821Z FAILED [0.3360s] [ 0%] 2025-12-04T10:35:20.5312825Z 2025-12-04T10:35:20.5312949Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.5313292Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5313393Z Traceback (most recent call last): 2025-12-04T10:35:20.5313763Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5313958Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5314436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5314651Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5315087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5315252Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5315680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5315799Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5316254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5316580Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5317034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5317152Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5317558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5317658Z return self._compile_to_module() 2025-12-04T10:35:20.5318064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5318198Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5318637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5318788Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5319211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5319410Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5319911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5320024Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5320466Z File "/tmp/tmpqn9luur0/ll/cllpwxipiwu4hbnavuptbgwyo4ilte4qmvu6wkoyi6wefkhxumw5.py", line 65, in 2025-12-04T10:35:20.5320866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5320954Z kernel.precompile( 2025-12-04T10:35:20.5321422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5321525Z self._precompile_worker() 2025-12-04T10:35:20.5322036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5322187Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5322692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5322898Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5323285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5323490Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5323862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5324150Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5324339Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5324938Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5325008Z ^ 2025-12-04T10:35:20.5325398Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5325405Z 2025-12-04T10:35:20.5326029Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5326034Z 2025-12-04T10:35:20.5326038Z 2025-12-04T10:35:20.5326224Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5326972Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5327047Z 2025-12-04T10:35:20.5327269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5327445Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5327531Z frames [('total', 1)] 2025-12-04T10:35:20.5327624Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5328028Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5328213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5328294Z graph_break [] 2025-12-04T10:35:20.5328644Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5328745Z Traceback (most recent call last): 2025-12-04T10:35:20.5329144Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5329341Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5329759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5329976Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5330410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5330574Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5331022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5331144Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5331608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5331878Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5332323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5332445Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5332848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5332984Z return self._compile_to_module() 2025-12-04T10:35:20.5333397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5333531Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5333972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5334083Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5334506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5334754Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5335249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5335353Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5335774Z File "/tmp/tmpe__hpp0d/7d/c7drbd5txr7yjssejo5fufwfqqyor6aele6uidtiqa6cbylljtad.py", line 65, in 2025-12-04T10:35:20.5336208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5336304Z kernel.precompile( 2025-12-04T10:35:20.5336773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5336912Z self._precompile_worker() 2025-12-04T10:35:20.5337419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5337571Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5338073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5338239Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5338613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5338818Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5339242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5339571Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5339761Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5340325Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5340402Z ^ 2025-12-04T10:35:20.5340790Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5340797Z 2025-12-04T10:35:20.5341411Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5341417Z 2025-12-04T10:35:20.5341421Z 2025-12-04T10:35:20.5341599Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5342354Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5342371Z 2025-12-04T10:35:20.5342594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5342776Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5342859Z frames [('total', 1)] 2025-12-04T10:35:20.5342954Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5343398Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5343584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5343664Z graph_break [] 2025-12-04T10:35:20.5343849Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5343929Z frames [('total', 1)] 2025-12-04T10:35:20.5344022Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5344208Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5344602Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5344728Z graph_break [] 2025-12-04T10:35:20.5344854Z =================================== FAILURES =================================== 2025-12-04T10:35:20.5345192Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5345296Z Traceback (most recent call last): 2025-12-04T10:35:20.5345660Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5345852Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5346266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5346606Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5347039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5347211Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5347643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5347765Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5348220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5348486Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5348932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5349056Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5349506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5349606Z return self._compile_to_module() 2025-12-04T10:35:20.5350017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5350157Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5350593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5350703Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5351130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5351319Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5351818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5351922Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5352350Z File "/tmp/tmp2ix215xe/7q/c7qisuakrtqg5doqq3zk2rlnzbfaw7fv6mukeq7h5g2w52ecquyt.py", line 65, in 2025-12-04T10:35:20.5352748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5352835Z kernel.precompile( 2025-12-04T10:35:20.5353352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5353445Z self._precompile_worker() 2025-12-04T10:35:20.5353950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5354100Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5354606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5354768Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5355196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5355397Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5355815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5356107Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5356297Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5356848Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5356958Z ^ 2025-12-04T10:35:20.5357352Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5357358Z 2025-12-04T10:35:20.5357962Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5357967Z 2025-12-04T10:35:20.5357971Z 2025-12-04T10:35:20.5358148Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5358896Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5358901Z 2025-12-04T10:35:20.5359126Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5359306Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5359433Z frames [('total', 1)] 2025-12-04T10:35:20.5359528Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5359927Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5360111Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5360192Z graph_break [] 2025-12-04T10:35:20.5360365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5360446Z frames [('total', 1)] 2025-12-04T10:35:20.5360550Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5360733Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5361122Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5361199Z graph_break [] 2025-12-04T10:35:20.5361376Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5361471Z frames [('total', 1)] 2025-12-04T10:35:20.5361560Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5361746Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5362136Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5362212Z graph_break [] 2025-12-04T10:35:20.5362808Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.xml - 2025-12-04T10:35:20.5362955Z =========================== short test summary info ============================ 2025-12-04T10:35:20.5363670Z FAILED [0.3360s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5364230Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5364299Z ^ 2025-12-04T10:35:20.5364751Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5364755Z 2025-12-04T10:35:20.5365366Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5365372Z 2025-12-04T10:35:20.5365376Z 2025-12-04T10:35:20.5365557Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5366354Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5366359Z 2025-12-04T10:35:20.5366622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5366774Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.5366940Z ================== 1 failed, 34 deselected, 2 rerun in 2.49s =================== 2025-12-04T10:35:20.5367021Z Got exit code 1 2025-12-04T10:35:20.5367106Z Retrying single test... 2025-12-04T10:35:20.5367506Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.xml 2025-12-04T10:35:20.5367640Z ============================= test session starts ============================== 2025-12-04T10:35:20.5367931Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.5368016Z cachedir: .pytest_cache 2025-12-04T10:35:20.5368464Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.5368563Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.5368653Z configfile: pytest.ini 2025-12-04T10:35:20.5369744Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.5369935Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.5370607Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5370709Z Running 1 items in this shard 2025-12-04T10:35:20.5370713Z 2025-12-04T10:35:20.5371857Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5372791Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5373165Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5373543Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5373975Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5374456Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5374947Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5375476Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5375968Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5376465Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5376871Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5377254Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5377792Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5378324Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5378872Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5379420Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5379867Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5380308Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5380722Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5381168Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5381561Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5382244Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5382695Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5383192Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5383795Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5384312Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5384652Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5385197Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5385753Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5386316Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5386908Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5387352Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5387750Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5388150Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5388686Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5389133Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5389591Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5390131Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5390578Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5391022Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5391435Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5391836Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5392228Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5392951Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5393403Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5393818Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5394201Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5394624Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5395008Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5395427Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5395929Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5396343Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5396826Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5397329Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5397819Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5398313Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5398767Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5399156Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5399644Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5400026Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5400506Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5401003Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5401537Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5402021Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5402485Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5402782Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5404743Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5405200Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5406140Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5406671Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5407424Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5408146Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5408989Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5409646Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5410162Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5411148Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5411453Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5412212Z E1204 10:26:41.681000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5412317Z ('RERUN', {'yellow': True}) [1.7702s] [100%] 2025-12-04T10:35:20.5413458Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5414439Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5414802Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5415180Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5415562Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5416074Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5416534Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5417025Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5417441Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5417905Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5418282Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5418646Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5419223Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5419720Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5420272Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5420766Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5421212Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5421656Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5422069Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5422511Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5422901Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5423580Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5424018Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5424555Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5425162Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5425697Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5426059Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5426605Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5427121Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5427729Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5428330Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5428731Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5429133Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5429531Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5430063Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5430522Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5430981Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5431511Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5431957Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5432402Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5432815Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5433218Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5433660Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5434342Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5434787Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5435202Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5435589Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5436055Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5436436Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5436854Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5437304Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5437715Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5438162Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5438702Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5439202Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5439694Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5440111Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5440498Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5440978Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5441370Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5441855Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5442308Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5442874Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5443359Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5443825Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5444127Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5446138Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5446588Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5447517Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5448045Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5448805Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5449378Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5450185Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5450843Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5451357Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5452285Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5452587Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5453356Z E1204 10:26:42.048000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5453467Z ('RERUN', {'yellow': True}) [0.3338s] [100%] 2025-12-04T10:35:20.5454605Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5455570Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5455965Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5456370Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5456752Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5457249Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5457702Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5458189Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5458602Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5463088Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5463581Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5463953Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5464465Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5464971Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5465488Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5466031Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5466485Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5466933Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5467358Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5467769Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5468172Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5468858Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5469309Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5469822Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5470475Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5471002Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5471344Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5471905Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5472466Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5473032Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5473636Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5474037Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5474442Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5474885Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5475423Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5475919Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5476382Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5476875Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5477321Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5477818Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5478240Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5478646Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5479050Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5479733Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5480187Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5480615Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5481008Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5481441Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5481869Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5482299Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5482751Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5483176Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5483623Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5484164Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5484659Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5485156Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5485578Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5486015Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5486498Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5486892Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5487381Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5487854Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5488383Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5488916Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5489388Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5489693Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5491632Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5492086Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5492988Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5493564Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5494326Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5494905Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5495690Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5496400Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5496918Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5497855Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5498229Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5498994Z E1204 10:26:42.383000 86500 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5499139Z FAILED [0.3335s] [100%] 2025-12-04T10:35:20.5499145Z 2025-12-04T10:35:20.5499269Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.5499616Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5499719Z Traceback (most recent call last): 2025-12-04T10:35:20.5500079Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5500274Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5500740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5500956Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5501389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5501548Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5501990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5502116Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5502571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5502840Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5503284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5503419Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5503824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5503930Z return self._compile_to_module() 2025-12-04T10:35:20.5504381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5504518Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5504965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5505077Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5505521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5505839Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5506444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5506559Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5506995Z File "/tmp/tmpmohm657b/ov/cov5vl5cspe2peu4mlvzmwz7kf5eg4iuhvcvury5x3haapw5vloh.py", line 65, in 2025-12-04T10:35:20.5507391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5507486Z kernel.precompile( 2025-12-04T10:35:20.5508231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5508336Z self._precompile_worker() 2025-12-04T10:35:20.5508844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5509081Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5509592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5509758Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5510140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5510355Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5510725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5511014Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5511204Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5511827Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5511906Z ^ 2025-12-04T10:35:20.5512298Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5512303Z 2025-12-04T10:35:20.5512917Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5512921Z 2025-12-04T10:35:20.5512925Z 2025-12-04T10:35:20.5513108Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5513871Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5513879Z 2025-12-04T10:35:20.5514107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5514290Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5514383Z frames [('total', 1)] 2025-12-04T10:35:20.5514476Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5514874Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5515132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5515213Z graph_break [] 2025-12-04T10:35:20.5515577Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5515690Z Traceback (most recent call last): 2025-12-04T10:35:20.5516067Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5516272Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5516685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5516962Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5517407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5517567Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5518007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5518132Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5518589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5518951Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5519440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5519569Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5519974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5520077Z return self._compile_to_module() 2025-12-04T10:35:20.5520494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5520630Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5521063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5521176Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5521638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5521838Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5522340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5522442Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5522878Z File "/tmp/tmpucqg84je/xr/cxrau6ed7meq5ylwsfmuzj5zamphy7wpqql4eho35htnhxjphcyh.py", line 65, in 2025-12-04T10:35:20.5523270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5523363Z kernel.precompile( 2025-12-04T10:35:20.5523832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5523926Z self._precompile_worker() 2025-12-04T10:35:20.5524443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5524592Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5525105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5525273Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5525735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5525958Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5526334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5526616Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5526813Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5527368Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5527567Z ^ 2025-12-04T10:35:20.5527958Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5527963Z 2025-12-04T10:35:20.5528571Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5528579Z 2025-12-04T10:35:20.5528583Z 2025-12-04T10:35:20.5528767Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5529517Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5529564Z 2025-12-04T10:35:20.5529793Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5529978Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5530071Z frames [('total', 1)] 2025-12-04T10:35:20.5530164Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5530568Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5530762Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5530842Z graph_break [] 2025-12-04T10:35:20.5531017Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5531105Z frames [('total', 1)] 2025-12-04T10:35:20.5531198Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5531378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5531818Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5531899Z graph_break [] 2025-12-04T10:35:20.5532024Z =================================== FAILURES =================================== 2025-12-04T10:35:20.5532366Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5532473Z Traceback (most recent call last): 2025-12-04T10:35:20.5532849Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5533045Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5533462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5533672Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5534111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5534281Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5534719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5534837Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5535295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5535613Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5536065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5536189Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5536594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5536703Z return self._compile_to_module() 2025-12-04T10:35:20.5537116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5537327Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5537766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5537871Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5538300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5538493Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5538992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5539158Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5539629Z File "/tmp/tmpgdt6u5h_/vf/cvferfwynoum5tyqrtzfd3x6onywyyn64wlvg7xkdqifuotltolr.py", line 65, in 2025-12-04T10:35:20.5540035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5540127Z kernel.precompile( 2025-12-04T10:35:20.5540599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5540699Z self._precompile_worker() 2025-12-04T10:35:20.5541210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5541365Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5541866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5542033Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5542466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5542674Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5543045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5543336Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5543533Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5544099Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5544171Z ^ 2025-12-04T10:35:20.5544567Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5544574Z 2025-12-04T10:35:20.5545198Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5545204Z 2025-12-04T10:35:20.5545208Z 2025-12-04T10:35:20.5545391Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5546235Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5546240Z 2025-12-04T10:35:20.5546476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5546659Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5546744Z frames [('total', 1)] 2025-12-04T10:35:20.5546838Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5547252Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5547437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5547564Z graph_break [] 2025-12-04T10:35:20.5547757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5547844Z frames [('total', 1)] 2025-12-04T10:35:20.5547954Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5548143Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5548540Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5548628Z graph_break [] 2025-12-04T10:35:20.5548810Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5548890Z frames [('total', 1)] 2025-12-04T10:35:20.5548996Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5549226Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5549627Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5549708Z graph_break [] 2025-12-04T10:35:20.5550265Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.xml - 2025-12-04T10:35:20.5550417Z =========================== short test summary info ============================ 2025-12-04T10:35:20.5551147Z FAILED [0.3335s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5551700Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5551791Z ^ 2025-12-04T10:35:20.5552233Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5552240Z 2025-12-04T10:35:20.5552859Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5552863Z 2025-12-04T10:35:20.5552867Z 2025-12-04T10:35:20.5553061Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5553815Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5553820Z 2025-12-04T10:35:20.5554043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5554199Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.5554385Z ================== 1 failed, 187 deselected, 2 rerun in 2.47s ================== 2025-12-04T10:35:20.5554466Z Got exit code 1 2025-12-04T10:35:20.5554565Z Retrying single test... 2025-12-04T10:35:20.5554970Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.xml 2025-12-04T10:35:20.5555110Z ============================= test session starts ============================== 2025-12-04T10:35:20.5555463Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.5555555Z cachedir: .pytest_cache 2025-12-04T10:35:20.5556005Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.5556115Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.5556204Z configfile: pytest.ini 2025-12-04T10:35:20.5556675Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.5556872Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.5557589Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5557694Z Running 1 items in this shard 2025-12-04T10:35:20.5557700Z 2025-12-04T10:35:20.5558851Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5559790Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5560204Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5560594Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5560982Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5561446Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5561920Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5562422Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5562896Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5563368Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5563751Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5564129Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5564639Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5565147Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5565667Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5566204Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5566667Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5567164Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5567595Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5567996Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5568393Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5569092Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5569585Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5570102Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5570716Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5571236Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5571621Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5572176Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5572703Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5573278Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5573882Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5574335Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5574733Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5575150Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5575718Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5576203Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5576665Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5577168Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5577626Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5578084Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5578577Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5578986Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5579455Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5580150Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5580607Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5581073Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5581460Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5581900Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5582291Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5582710Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5583227Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5583651Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5584104Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5584614Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5585109Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5585666Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5586096Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5586499Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5586986Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5587390Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5587883Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5588336Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5588878Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5589367Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5589842Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5590189Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5592137Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5592637Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5593537Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5594078Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5594879Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5595463Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5596275Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5596948Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5597466Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5598461Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5598779Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5599553Z E1204 10:26:52.385000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5599678Z ('RERUN', {'yellow': True}) [1.7849s] [100%] 2025-12-04T10:35:20.5600824Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5601762Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5602131Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5602560Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5602952Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5603403Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5603877Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5604369Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5604844Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5605463Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5606631Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5607483Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5608611Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5609811Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5610929Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5612127Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5613179Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5614186Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5615222Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5616216Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5617123Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5618320Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5619598Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5620645Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5621865Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5623087Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5624043Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5625109Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5626289Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5627486Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5628765Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5629944Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5630870Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5631783Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5632824Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5633920Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5634983Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5636063Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5637109Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5638117Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5639084Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5640011Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5640998Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5642191Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5643433Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5644409Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5645325Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5646293Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5647223Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5648141Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5649128Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5650159Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5651131Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5652184Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5653299Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5654434Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5655469Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5656441Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5657426Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5658406Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5659478Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5660538Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5661629Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5662756Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5663820Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5664694Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5667078Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5669565Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5671012Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5672530Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5673924Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5675410Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5676896Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5678408Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5679723Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5681287Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5682631Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5683799Z E1204 10:26:52.752000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5684821Z ('RERUN', {'yellow': True}) [0.3338s] [100%] 2025-12-04T10:35:20.5686221Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.5688387Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5689784Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T10:35:20.5690635Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.5691559Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5692500Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5693521Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5694581Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5695598Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.5696592Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5697546Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5698396Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.5699440Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5700546Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5701709Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.5702819Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5703865Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5704876Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5705917Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5706862Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5707977Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5709164Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5710396Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5711593Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5712905Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.5714222Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.5715241Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.5716311Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.5717645Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.5718846Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.5720121Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.5721228Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.5722145Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.5723053Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.5724100Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5725187Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.5726202Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.5727322Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5728371Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5729373Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5730351Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5731338Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5732249Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.5733436Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5734673Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.5735646Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.5736633Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.5737557Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.5738473Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.5739441Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.5740436Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.5741417Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.5742431Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.5743488Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5744602Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.5745703Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T10:35:20.5746773Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.5747692Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.5748686Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.5749676Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.5750660Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.5751756Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.5752847Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T10:35:20.5753971Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.5755043Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T10:35:20.5756002Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5758337Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5760845Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5762297Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5763826Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5765229Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5766657Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5768142Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5769660Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5770945Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5772497Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5773828Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5775011Z E1204 10:26:53.086000 86681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5776018Z FAILED [0.3324s] [100%] 2025-12-04T10:35:20.5776165Z 2025-12-04T10:35:20.5776293Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.5776908Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5777469Z Traceback (most recent call last): 2025-12-04T10:35:20.5778017Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5778685Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5779447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5780191Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5780947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5781706Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5782402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5783069Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5783749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5784581Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5785414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5786189Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5786829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5787442Z return self._compile_to_module() 2025-12-04T10:35:20.5788047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5793700Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5794457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5795232Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5795865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5796604Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5797501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5798224Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5798880Z File "/tmp/tmpyteabnu3/fx/cfxidyigfwltuoh653wimpocbemcp5kcliomzxxd6gsqscd7xypm.py", line 65, in 2025-12-04T10:35:20.5799838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5800441Z kernel.precompile( 2025-12-04T10:35:20.5801064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5801750Z self._precompile_worker() 2025-12-04T10:35:20.5802438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5803210Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5803980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5804778Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5805444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5806197Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5806947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5808105Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5808701Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5809555Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5810305Z ^ 2025-12-04T10:35:20.5810794Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5811386Z 2025-12-04T10:35:20.5811997Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5812714Z 2025-12-04T10:35:20.5812718Z 2025-12-04T10:35:20.5812905Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5813945Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5814813Z 2025-12-04T10:35:20.5815040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5815662Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5816065Z frames [('total', 1)] 2025-12-04T10:35:20.5816303Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5816878Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5817580Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5817961Z graph_break [] 2025-12-04T10:35:20.5818430Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5818988Z Traceback (most recent call last): 2025-12-04T10:35:20.5819582Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5820250Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5820972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5821784Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5822540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5823258Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5823967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5824640Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5825320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5826169Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5827004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5827692Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5828328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5828962Z return self._compile_to_module() 2025-12-04T10:35:20.5829573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5830229Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5831070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5831740Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5832377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5833105Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5833921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5834715Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5835355Z File "/tmp/tmp44k0439s/5y/c5ye4jycncnvw4gwd7j4aup2rf4bhoqelymqqbtpxdumzremnr5q.py", line 65, in 2025-12-04T10:35:20.5836350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5836951Z kernel.precompile( 2025-12-04T10:35:20.5837576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5838295Z self._precompile_worker() 2025-12-04T10:35:20.5839011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5839843Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5840711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5841500Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5842159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5842889Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5843587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5844360Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5844959Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5845850Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5846644Z ^ 2025-12-04T10:35:20.5847135Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5847650Z 2025-12-04T10:35:20.5848254Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5848974Z 2025-12-04T10:35:20.5848977Z 2025-12-04T10:35:20.5849167Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5850208Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5851065Z 2025-12-04T10:35:20.5851287Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5851823Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5852209Z frames [('total', 1)] 2025-12-04T10:35:20.5852446Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5853024Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5853730Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5854112Z graph_break [] 2025-12-04T10:35:20.5854459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5854849Z frames [('total', 1)] 2025-12-04T10:35:20.5855087Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5855439Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5856136Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5856729Z graph_break [] 2025-12-04T10:35:20.5856978Z =================================== FAILURES =================================== 2025-12-04T10:35:20.5857557Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T10:35:20.5858165Z Traceback (most recent call last): 2025-12-04T10:35:20.5858719Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.5859442Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.5860174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.5860914Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.5861677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.5862386Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.5863143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.5863822Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.5864518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.5865359Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.5866281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.5866965Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.5867609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.5868235Z return self._compile_to_module() 2025-12-04T10:35:20.5868896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.5869568Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.5870251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.5870917Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.5871550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.5872280Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.5873084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.5873808Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.5874457Z File "/tmp/tmpmx1tjsyz/jy/cjya57nhlljiseo34fttqqnbztnnw6fpb4kfoonooc3w5yuzpswn.py", line 65, in 2025-12-04T10:35:20.5875406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.5876012Z kernel.precompile( 2025-12-04T10:35:20.5876641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.5877324Z self._precompile_worker() 2025-12-04T10:35:20.5878002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.5878827Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.5879596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5880386Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5881042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5881754Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5882456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5883273Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5883868Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5884727Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5885475Z ^ 2025-12-04T10:35:20.5886006Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5886526Z 2025-12-04T10:35:20.5887133Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5887908Z 2025-12-04T10:35:20.5887912Z 2025-12-04T10:35:20.5888093Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5889140Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5889998Z 2025-12-04T10:35:20.5890240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5890761Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5891145Z frames [('total', 1)] 2025-12-04T10:35:20.5891386Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5891956Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5892665Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5893095Z graph_break [] 2025-12-04T10:35:20.5893405Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5893786Z frames [('total', 1)] 2025-12-04T10:35:20.5894030Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5894393Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5895090Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5895696Z graph_break [] 2025-12-04T10:35:20.5896047Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.5896427Z frames [('total', 1)] 2025-12-04T10:35:20.5896666Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.5897026Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.5897726Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.5898314Z graph_break [] 2025-12-04T10:35:20.5898998Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.xml - 2025-12-04T10:35:20.5899959Z =========================== short test summary info ============================ 2025-12-04T10:35:20.5901009Z FAILED [0.3324s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.5902391Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5903132Z ^ 2025-12-04T10:35:20.5903623Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5904131Z 2025-12-04T10:35:20.5904745Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.5905508Z 2025-12-04T10:35:20.5905512Z 2025-12-04T10:35:20.5905720Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.5906787Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5907652Z 2025-12-04T10:35:20.5908051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.5908683Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.5909117Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ================== 2025-12-04T10:35:20.5909482Z Got exit code 1 2025-12-04T10:35:20.5910241Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T10:35:20.5911244Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.5912110Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.xml 2025-12-04T10:35:20.5912765Z ============================= test session starts ============================== 2025-12-04T10:35:20.5913325Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.5913824Z cachedir: .pytest_cache 2025-12-04T10:35:20.5914419Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.5915084Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.5915370Z configfile: pytest.ini 2025-12-04T10:35:20.5916042Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.5916807Z collecting ... collected 188 items / 35 deselected / 153 selected 2025-12-04T10:35:20.5917234Z stepcurrent: skipping 35 already run items. 2025-12-04T10:35:20.5917548Z Running 153 items in this shard 2025-12-04T10:35:20.5917724Z 2025-12-04T10:35:20.5918961Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.5921369Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5922902Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.5923760Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.5924638Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5925717Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5926778Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.5927851Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.5928960Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.5930098Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.5931058Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.5932096Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.5933195Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.5934209Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.5935321Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.5936378Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.5937387Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.5938362Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.5939373Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.5940287Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.5941269Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.5942452Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.5943781Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.5945059Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.5946199Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.5947175Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.5948075Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.5948989Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.5949900Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.5950845Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.5951816Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.5952793Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.5953766Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.5954829Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.5955990Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.5957068Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.5958080Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.5959010Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.5960050Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.5961032Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.5962020Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.5963077Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.5964244Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.5965435Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.5966573Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.5967725Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.5968737Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.5971389Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.5974165Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.5975664Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.5977187Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.5978593Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.5980144Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.5981583Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.5983092Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.5984368Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.5986110Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5987635Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.5988824Z E1204 10:27:03.216000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.5989809Z ('RERUN', {'yellow': True}) [1.9043s] [ 0%] 2025-12-04T10:35:20.5991287Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.5993673Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.5995246Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.5996157Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.5997038Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.5997988Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.5999014Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6000083Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6001179Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6002302Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6003297Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6004329Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6005433Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.6006493Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.6007563Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6008946Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6009957Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6010922Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6011948Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6012864Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.6013800Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.6014976Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6016304Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6017669Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6018813Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6019909Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.6020809Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.6021721Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.6022638Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.6023538Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.6024515Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.6025498Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.6026469Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.6027635Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6028746Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.6029828Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.6030844Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.6031878Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.6032874Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.6033859Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.6034848Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.6035911Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.6037131Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.6038327Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.6039369Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.6040525Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.6041574Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6044280Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6047060Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6048512Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6050046Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6051451Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6052938Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6054378Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6055899Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6057177Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6058920Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6060454Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6061637Z E1204 10:27:03.656000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6062663Z ('RERUN', {'yellow': True}) [0.4090s] [ 0%] 2025-12-04T10:35:20.6064109Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.6066509Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6068046Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.6068905Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.6069830Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6070790Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6071813Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6072881Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6073981Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6075056Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6076030Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6077068Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6078163Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.6079263Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.6080339Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6081396Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6082416Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6083429Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6084363Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6085282Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.6086223Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.6087397Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6088777Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6090062Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6091205Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6092178Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.6093081Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.6093999Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.6094963Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.6095873Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.6096844Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.6097827Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.6098804Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.6099926Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6100422Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.6100902Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.6101322Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.6101897Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.6102396Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.6102779Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.6103273Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.6103730Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.6104373Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.6104872Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.6105304Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.6105904Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.6106279Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6108894Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6109471Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6110373Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6110914Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6111669Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6112252Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6113004Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6113665Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6114252Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6115433Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6115751Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6116515Z E1204 10:27:04.066000 86862 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6116665Z FAILED [0.4081s] [ 0%] 2025-12-04T10:35:20.6116670Z 2025-12-04T10:35:20.6116789Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.6117136Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6117239Z Traceback (most recent call last): 2025-12-04T10:35:20.6117599Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6117802Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6118217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6118491Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6118929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6119089Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6119532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6119652Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6120113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6120382Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6120867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6121004Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6121412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6121510Z return self._compile_to_module() 2025-12-04T10:35:20.6121923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6122062Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6122508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6122616Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6123034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6123237Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6123740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6123849Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6124272Z File "/tmp/tmpq6lie_52/ff/cffniubmfohrettsmmh2tfk6sstfl6nhgon5b6rvek6i4xyiqnxn.py", line 137, in 2025-12-04T10:35:20.6124664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6124807Z kernel.precompile( 2025-12-04T10:35:20.6125281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6125378Z self._precompile_worker() 2025-12-04T10:35:20.6125892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6126043Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6126552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6126761Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6127140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6127353Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6127766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6128056Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6128246Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6128940Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6129064Z ^ 2025-12-04T10:35:20.6129457Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6129462Z 2025-12-04T10:35:20.6130073Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6130080Z 2025-12-04T10:35:20.6130085Z 2025-12-04T10:35:20.6136623Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6137400Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6137415Z 2025-12-04T10:35:20.6137728Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6137922Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6138023Z frames [('total', 1)] 2025-12-04T10:35:20.6138123Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6138530Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6138729Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6138818Z graph_break [] 2025-12-04T10:35:20.6139222Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6139329Z Traceback (most recent call last): 2025-12-04T10:35:20.6139693Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6139895Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6140318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6140533Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6140979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6141147Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6141642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6141765Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6142219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6142501Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6142955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6143085Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6143536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6143638Z return self._compile_to_module() 2025-12-04T10:35:20.6144053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6144190Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6144632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6144748Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6145171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6145416Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6145914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6146021Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6146482Z File "/tmp/tmp3fletsr9/pq/cpqqhfzeruvcohwwqrokjmbvx5nocxo6h7vgenas534kh2hcw5qa.py", line 137, in 2025-12-04T10:35:20.6146878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6146975Z kernel.precompile( 2025-12-04T10:35:20.6147451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6147548Z self._precompile_worker() 2025-12-04T10:35:20.6148064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6148260Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6148767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6148941Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6149320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6149533Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6149906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6150190Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6150388Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6151082Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6151167Z ^ 2025-12-04T10:35:20.6151557Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6151562Z 2025-12-04T10:35:20.6152215Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6152227Z 2025-12-04T10:35:20.6152231Z 2025-12-04T10:35:20.6152414Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6153157Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6153165Z 2025-12-04T10:35:20.6153397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6153580Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6153743Z frames [('total', 1)] 2025-12-04T10:35:20.6153842Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6154242Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6154436Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6154517Z graph_break [] 2025-12-04T10:35:20.6154697Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6154790Z frames [('total', 1)] 2025-12-04T10:35:20.6154889Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6155072Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6155471Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6155594Z graph_break [] 2025-12-04T10:35:20.6155727Z =================================== FAILURES =================================== 2025-12-04T10:35:20.6156114Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6156219Z Traceback (most recent call last): 2025-12-04T10:35:20.6156585Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6156783Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6157196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6157416Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6157900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6158216Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6158650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6158773Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6159234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6159508Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6159955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6160078Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6160484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6160598Z return self._compile_to_module() 2025-12-04T10:35:20.6161008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6161156Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6161594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6161703Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6162179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6162379Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6162877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6162987Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6163428Z File "/tmp/tmp57nr7q4e/3r/c3rs2r2zgrzp53qlhjntemg26khskch3b6jysnyonxqxg2qfehvj.py", line 137, in 2025-12-04T10:35:20.6163831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6163963Z kernel.precompile( 2025-12-04T10:35:20.6164437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6164545Z self._precompile_worker() 2025-12-04T10:35:20.6165054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6165214Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6165743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6165935Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6166370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6166579Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6166954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6167245Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6167439Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6168139Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6168217Z ^ 2025-12-04T10:35:20.6168613Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6168620Z 2025-12-04T10:35:20.6169281Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6169288Z 2025-12-04T10:35:20.6169292Z 2025-12-04T10:35:20.6169474Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6170268Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6170275Z 2025-12-04T10:35:20.6170498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6170687Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6170779Z frames [('total', 1)] 2025-12-04T10:35:20.6170880Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6171291Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6171480Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6171563Z graph_break [] 2025-12-04T10:35:20.6171749Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6171835Z frames [('total', 1)] 2025-12-04T10:35:20.6171938Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6172170Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6172563Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6172652Z graph_break [] 2025-12-04T10:35:20.6172832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6172915Z frames [('total', 1)] 2025-12-04T10:35:20.6173018Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6173205Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6173595Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6173729Z graph_break [] 2025-12-04T10:35:20.6174325Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.xml - 2025-12-04T10:35:20.6174480Z =========================== short test summary info ============================ 2025-12-04T10:35:20.6175203Z FAILED [0.4081s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6175927Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6176046Z ^ 2025-12-04T10:35:20.6176439Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6176446Z 2025-12-04T10:35:20.6177062Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6177067Z 2025-12-04T10:35:20.6177071Z 2025-12-04T10:35:20.6177255Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6178007Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6178012Z 2025-12-04T10:35:20.6178237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6178387Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.6178609Z ================== 1 failed, 35 deselected, 2 rerun in 2.76s =================== 2025-12-04T10:35:20.6178693Z Got exit code 1 2025-12-04T10:35:20.6178788Z Retrying single test... 2025-12-04T10:35:20.6179247Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.xml 2025-12-04T10:35:20.6179383Z ============================= test session starts ============================== 2025-12-04T10:35:20.6179686Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.6179775Z cachedir: .pytest_cache 2025-12-04T10:35:20.6180221Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.6180327Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.6180415Z configfile: pytest.ini 2025-12-04T10:35:20.6180882Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.6181072Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.6181746Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6181846Z Running 1 items in this shard 2025-12-04T10:35:20.6181852Z 2025-12-04T10:35:20.6183130Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.6184204Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6184610Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.6184996Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.6185386Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6185837Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6186296Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6186790Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6187331Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6187804Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6188183Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6188725Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6189169Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.6189675Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.6190170Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6190621Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6191078Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6191491Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6191907Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6192298Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.6192730Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.6193374Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6194025Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6194612Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6195060Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6195484Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.6195955Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.6196370Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.6196762Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.6197168Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.6197624Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.6198038Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.6198524Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.6199035Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6199529Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.6200012Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.6200436Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.6200868Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.6201366Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.6201754Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.6202245Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.6202703Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.6203308Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.6203798Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.6204235Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.6204845Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.6205189Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6207491Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6208328Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6209261Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6209795Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6210662Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6211242Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6212004Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6212658Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6213236Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6214413Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6214723Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6215494Z E1204 10:27:13.979000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6215606Z ('RERUN', {'yellow': True}) [1.8966s] [100%] 2025-12-04T10:35:20.6216846Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.6217919Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6218342Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.6218731Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.6219168Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6219632Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6220090Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6220656Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6221153Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6221621Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6222006Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6222546Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6223043Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.6223508Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.6224004Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6224462Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6224906Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6225371Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6225779Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6226172Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.6226602Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.6227247Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6227833Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6228418Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6228875Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6229283Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.6229709Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.6230132Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.6230511Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.6230927Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.6231377Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.6231832Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.6232287Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.6232790Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6233289Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.6233767Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.6234228Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.6234627Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.6235116Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.6235510Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.6235997Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.6236520Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.6237131Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.6237622Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.6238064Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.6238667Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.6238981Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6241267Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6241732Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6242625Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6243209Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6243970Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6244554Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6245306Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6246033Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6246585Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6247652Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6247968Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6248774Z E1204 10:27:14.417000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6248888Z ('RERUN', {'yellow': True}) [0.4071s] [100%] 2025-12-04T10:35:20.6250131Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.6251198Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6251563Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.6251950Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.6252347Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6252803Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6253308Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6253813Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6254311Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6254800Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6255179Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6255835Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6256296Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.6256762Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.6257268Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6257761Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6258217Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6258636Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6259083Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6259486Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.6259911Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.6260604Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6261193Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6261775Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6262231Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6262647Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.6263041Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.6263463Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.6263849Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.6264266Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.6264761Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.6265183Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.6265625Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.6266132Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6266634Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.6267150Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.6267582Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.6267973Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.6268466Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.6268854Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.6269381Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.6269849Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.6270452Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.6270951Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.6271387Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.6272030Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.6272340Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6274585Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6275054Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6275949Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6276537Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6277294Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6277878Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6278628Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6279345Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6279871Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6280944Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6281335Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6282108Z E1204 10:27:14.826000 87064 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6282214Z FAILED [0.4077s] [100%] 2025-12-04T10:35:20.6282219Z 2025-12-04T10:35:20.6282345Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.6282694Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6282799Z Traceback (most recent call last): 2025-12-04T10:35:20.6283176Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6283390Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6283848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6284071Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6284526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6284697Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6285151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6285277Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6285742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6286038Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6286497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6286638Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6287054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6287155Z return self._compile_to_module() 2025-12-04T10:35:20.6287628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6287774Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6288221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6288343Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6288768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6288985Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6289524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6289632Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6290086Z File "/tmp/tmplaf0delm/tx/ctxm7ilb5wrqs7qgfxksua4p4sl66noiuw7no2bc37qrpem5z4bc.py", line 137, in 2025-12-04T10:35:20.6290488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6290596Z kernel.precompile( 2025-12-04T10:35:20.6291079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6291178Z self._precompile_worker() 2025-12-04T10:35:20.6291704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6291900Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6292419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6292601Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6292984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6293210Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6293590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6293886Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6294095Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6294843Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6294925Z ^ 2025-12-04T10:35:20.6295320Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6295325Z 2025-12-04T10:35:20.6295940Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6295945Z 2025-12-04T10:35:20.6295958Z 2025-12-04T10:35:20.6296141Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6296887Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6296896Z 2025-12-04T10:35:20.6297142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6297334Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6297418Z frames [('total', 1)] 2025-12-04T10:35:20.6297526Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6297932Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6298177Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6298258Z graph_break [] 2025-12-04T10:35:20.6298593Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6298711Z Traceback (most recent call last): 2025-12-04T10:35:20.6299109Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6299316Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6299748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6300014Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6300467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6300633Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6301080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6301213Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6301674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6301969Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6302460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6302588Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6303011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6303117Z return self._compile_to_module() 2025-12-04T10:35:20.6303547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6303688Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6304131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6304249Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6304718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6304919Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6305437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6305550Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6306017Z File "/tmp/tmprhc1pp3a/gk/cgkk7yqjytg6b4cjserdwc3fycwq2i4pemvugnq7ym5of5cfywkh.py", line 137, in 2025-12-04T10:35:20.6306418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6306515Z kernel.precompile( 2025-12-04T10:35:20.6306994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6307094Z self._precompile_worker() 2025-12-04T10:35:20.6307622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6308018Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6308620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6308802Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6309272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6309478Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6309859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6310142Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6310344Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6311038Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6311168Z ^ 2025-12-04T10:35:20.6311565Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6311570Z 2025-12-04T10:35:20.6312184Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6312189Z 2025-12-04T10:35:20.6312193Z 2025-12-04T10:35:20.6312383Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6313131Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6313191Z 2025-12-04T10:35:20.6313425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6313613Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6313698Z frames [('total', 1)] 2025-12-04T10:35:20.6313804Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6314204Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6314390Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6314479Z graph_break [] 2025-12-04T10:35:20.6314658Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6314746Z frames [('total', 1)] 2025-12-04T10:35:20.6314845Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6315031Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6315496Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6315578Z graph_break [] 2025-12-04T10:35:20.6315698Z =================================== FAILURES =================================== 2025-12-04T10:35:20.6316041Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6316147Z Traceback (most recent call last): 2025-12-04T10:35:20.6316523Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6316725Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6317140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6317359Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6317800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6317962Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6318405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6318524Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6319031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6319305Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6319748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6319881Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6320292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6320400Z return self._compile_to_module() 2025-12-04T10:35:20.6320879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6321019Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6321472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6321583Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6322004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6322209Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6322710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6322867Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6323284Z File "/tmp/tmpk_2vk0w0/g5/cg5n4digy4cyupl5slyiujzzyjq77i2preqmwg6r4th3vq7cqwqd.py", line 137, in 2025-12-04T10:35:20.6323682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6323783Z kernel.precompile( 2025-12-04T10:35:20.6324257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6324364Z self._precompile_worker() 2025-12-04T10:35:20.6324879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6325031Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6325546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6325763Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6326152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6326379Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6326759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6327055Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6327248Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6327941Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6328030Z ^ 2025-12-04T10:35:20.6328432Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6328436Z 2025-12-04T10:35:20.6329058Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6329063Z 2025-12-04T10:35:20.6329067Z 2025-12-04T10:35:20.6329258Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6330067Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6330073Z 2025-12-04T10:35:20.6330301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6330485Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6330583Z frames [('total', 1)] 2025-12-04T10:35:20.6330688Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6331107Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6331339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6331428Z graph_break [] 2025-12-04T10:35:20.6331622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6331708Z frames [('total', 1)] 2025-12-04T10:35:20.6331806Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6332005Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6332411Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6332497Z graph_break [] 2025-12-04T10:35:20.6332687Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6332898Z frames [('total', 1)] 2025-12-04T10:35:20.6333004Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6333191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6333592Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6333683Z graph_break [] 2025-12-04T10:35:20.6334246Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.xml - 2025-12-04T10:35:20.6334400Z =========================== short test summary info ============================ 2025-12-04T10:35:20.6335127Z FAILED [0.4077s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6335894Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6335996Z ^ 2025-12-04T10:35:20.6336401Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6336406Z 2025-12-04T10:35:20.6337024Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6337028Z 2025-12-04T10:35:20.6337035Z 2025-12-04T10:35:20.6337219Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6338080Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6338093Z 2025-12-04T10:35:20.6338323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6338481Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.6338658Z ================== 1 failed, 187 deselected, 2 rerun in 2.75s ================== 2025-12-04T10:35:20.6338740Z Got exit code 1 2025-12-04T10:35:20.6338827Z Retrying single test... 2025-12-04T10:35:20.6339293Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.xml 2025-12-04T10:35:20.6339487Z ============================= test session starts ============================== 2025-12-04T10:35:20.6339788Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.6339878Z cachedir: .pytest_cache 2025-12-04T10:35:20.6340323Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.6340426Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.6340521Z configfile: pytest.ini 2025-12-04T10:35:20.6340988Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.6341227Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.6341905Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6342004Z Running 1 items in this shard 2025-12-04T10:35:20.6342011Z 2025-12-04T10:35:20.6343243Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.6344313Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6344721Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.6345099Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.6345494Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6345985Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6346470Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6347003Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6347504Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6347981Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6348360Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6348913Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6349355Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.6349820Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.6350321Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6350770Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6351269Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6351685Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6352096Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6352500Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.6352970Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.6353623Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6354208Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6354796Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6355288Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6355698Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.6356089Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.6356506Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.6356891Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.6357295Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.6357752Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.6358216Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.6358664Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.6359182Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6359678Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.6360165Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.6360588Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.6360984Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.6361505Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.6361896Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.6362473Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.6362936Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.6363550Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.6364054Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.6364539Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.6365157Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.6365472Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6367778Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6368281Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6369190Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6369765Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6370531Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6371125Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6371883Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6372558Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6373082Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6374158Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6374510Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6375281Z E1204 10:27:24.581000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6375393Z ('RERUN', {'yellow': True}) [1.8776s] [100%] 2025-12-04T10:35:20.6376679Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.6377799Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6378161Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.6378546Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.6378933Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6379481Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6379941Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6380434Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6380943Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6381511Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6381901Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6382497Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6382952Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.6383419Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.6383912Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6384380Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6384833Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6385262Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6385715Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6386134Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.6386607Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.6387252Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6387846Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6388426Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6388911Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6389328Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.6389710Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.6390133Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.6390510Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.6390961Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.6391419Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.6391829Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.6392273Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.6392778Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6393267Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.6393788Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.6394212Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.6394606Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.6395093Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.6395477Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.6395969Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.6396428Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.6397035Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.6397520Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.6398000Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.6398695Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.6398998Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6401264Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6401774Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6402717Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6403256Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6404024Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6404602Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6405426Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6406138Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6406670Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6408029Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6408424Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6409200Z E1204 10:27:25.020000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6409314Z ('RERUN', {'yellow': True}) [0.4079s] [100%] 2025-12-04T10:35:20.6410633Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T10:35:20.6411699Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6412076Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:20.6412459Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T10:35:20.6412909Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6413372Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6413831Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6414335Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6414827Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6415356Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6415750Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6416290Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6416742Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T10:35:20.6417207Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T10:35:20.6417758Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6418212Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6418665Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6419135Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6419542Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6419943Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T10:35:20.6420368Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T10:35:20.6421016Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6421607Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6422233Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T10:35:20.6422690Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6423099Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T10:35:20.6423484Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T10:35:20.6423906Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T10:35:20.6424332Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T10:35:20.6424746Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T10:35:20.6425198Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T10:35:20.6425633Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T10:35:20.6426112Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T10:35:20.6426659Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6427155Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T10:35:20.6427627Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T10:35:20.6428056Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T10:35:20.6428451Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T10:35:20.6428935Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T10:35:20.6429375Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T10:35:20.6429866Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T10:35:20.6430332Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T10:35:20.6430926Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T10:35:20.6431416Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T10:35:20.6431853Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T10:35:20.6432454Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T10:35:20.6432762Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6435041Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6435561Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6436499Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6437042Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6437797Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6438428Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6439177Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6439833Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6440354Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6441456Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6441772Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6442530Z E1204 10:27:25.430000 87266 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6442623Z FAILED [0.4078s] [100%] 2025-12-04T10:35:20.6442628Z 2025-12-04T10:35:20.6442746Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.6443081Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6443189Z Traceback (most recent call last): 2025-12-04T10:35:20.6443546Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6443748Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6444172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6444387Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6444833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6445037Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6445478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6445605Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6446060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6451223Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6451704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6451904Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6452325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6452426Z return self._compile_to_module() 2025-12-04T10:35:20.6452847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6452993Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6453438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6453551Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6454057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6454257Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6454766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6454878Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6455346Z File "/tmp/tmpezxridec/yn/cyntqhsjvsr3mrcrbdyk6euuldxkncwn3sh3lvcq3ku2l5nnwg7t.py", line 137, in 2025-12-04T10:35:20.6455745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6455839Z kernel.precompile( 2025-12-04T10:35:20.6456319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6456419Z self._precompile_worker() 2025-12-04T10:35:20.6456970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6457130Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6457638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6457813Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6458199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6458405Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6458784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6459136Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6459365Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6460112Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6460188Z ^ 2025-12-04T10:35:20.6460615Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6460621Z 2025-12-04T10:35:20.6461320Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6461325Z 2025-12-04T10:35:20.6461329Z 2025-12-04T10:35:20.6461516Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6462262Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6462269Z 2025-12-04T10:35:20.6462542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6462723Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6462811Z frames [('total', 1)] 2025-12-04T10:35:20.6462919Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6463324Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6463513Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6463604Z graph_break [] 2025-12-04T10:35:20.6463940Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6464048Z Traceback (most recent call last): 2025-12-04T10:35:20.6464452Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6464647Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6465068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6465275Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6465712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6465881Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6466311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6466438Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6466886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6467198Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6467649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6467769Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6468186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6468288Z return self._compile_to_module() 2025-12-04T10:35:20.6468698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6468839Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6469280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6469393Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6469823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6470017Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6470519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6470622Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6471116Z File "/tmp/tmpwgnsy2ge/p2/cp2slbvsckyo36qdautgtbjhvb7obiosuimpvfzy66mivlarwy4b.py", line 137, in 2025-12-04T10:35:20.6471514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6471612Z kernel.precompile( 2025-12-04T10:35:20.6472091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6472196Z self._precompile_worker() 2025-12-04T10:35:20.6472702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6472900Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6473404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6473572Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6473957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6474164Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6474544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6474827Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6475063Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6475860Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6475934Z ^ 2025-12-04T10:35:20.6476335Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6476343Z 2025-12-04T10:35:20.6476954Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6476959Z 2025-12-04T10:35:20.6476964Z 2025-12-04T10:35:20.6477151Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6477941Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6477952Z 2025-12-04T10:35:20.6478176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6478369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6478456Z frames [('total', 1)] 2025-12-04T10:35:20.6478552Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6478957Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6479149Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6479234Z graph_break [] 2025-12-04T10:35:20.6479415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6479498Z frames [('total', 1)] 2025-12-04T10:35:20.6479607Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6479797Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6480197Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6480287Z graph_break [] 2025-12-04T10:35:20.6480406Z =================================== FAILURES =================================== 2025-12-04T10:35:20.6480746Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T10:35:20.6480889Z Traceback (most recent call last): 2025-12-04T10:35:20.6481251Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6481459Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6481878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6482099Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6482538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6482743Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6483186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6483305Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6483762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6484042Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6484488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6484664Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6485083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6485186Z return self._compile_to_module() 2025-12-04T10:35:20.6485609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6485745Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6486189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6486296Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6486714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6486916Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6487461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6487571Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6488029Z File "/tmp/tmpm00rcv7e/yj/cyjmet6qihmitx5eqm2uuxqdg5ogetcaau2encawtmy5j6p7ntr2.py", line 137, in 2025-12-04T10:35:20.6488423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6488524Z kernel.precompile( 2025-12-04T10:35:20.6489001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6489098Z self._precompile_worker() 2025-12-04T10:35:20.6489614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6489762Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6490284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6490451Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6490833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6491047Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6491462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6491747Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6491951Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6492649Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6492732Z ^ 2025-12-04T10:35:20.6493127Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6493198Z 2025-12-04T10:35:20.6493805Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6493816Z 2025-12-04T10:35:20.6493820Z 2025-12-04T10:35:20.6494005Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6494750Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6494755Z 2025-12-04T10:35:20.6494990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6495212Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6495303Z frames [('total', 1)] 2025-12-04T10:35:20.6495398Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6495798Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6495984Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6496061Z graph_break [] 2025-12-04T10:35:20.6496243Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6496333Z frames [('total', 1)] 2025-12-04T10:35:20.6496428Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6496614Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6497010Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6497087Z graph_break [] 2025-12-04T10:35:20.6497313Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6497398Z frames [('total', 1)] 2025-12-04T10:35:20.6497491Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6497680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6498069Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T10:35:20.6498158Z graph_break [] 2025-12-04T10:35:20.6498722Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.xml - 2025-12-04T10:35:20.6498869Z =========================== short test summary info ============================ 2025-12-04T10:35:20.6499661Z FAILED [0.4078s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6500352Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6500436Z ^ 2025-12-04T10:35:20.6500828Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6500833Z 2025-12-04T10:35:20.6501484Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6501489Z 2025-12-04T10:35:20.6501492Z 2025-12-04T10:35:20.6501679Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6502421Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6502428Z 2025-12-04T10:35:20.6502660Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6502812Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.6503019Z ================== 1 failed, 187 deselected, 2 rerun in 2.73s ================== 2025-12-04T10:35:20.6503107Z Got exit code 1 2025-12-04T10:35:20.6503639Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T10:35:20.6503995Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.6504398Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.xml 2025-12-04T10:35:20.6504535Z ============================= test session starts ============================== 2025-12-04T10:35:20.6504839Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.6504974Z cachedir: .pytest_cache 2025-12-04T10:35:20.6505428Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.6505531Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.6505624Z configfile: pytest.ini 2025-12-04T10:35:20.6506094Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.6506290Z collecting ... collected 188 items / 36 deselected / 152 selected 2025-12-04T10:35:20.6506412Z stepcurrent: skipping 36 already run items. 2025-12-04T10:35:20.6506526Z Running 152 items in this shard 2025-12-04T10:35:20.6506530Z 2025-12-04T10:35:20.6507985Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6509072Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6509460Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6509859Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6510255Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6510716Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6511200Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6511699Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6512215Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6512773Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6513168Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6513539Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6514052Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6514709Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6515222Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6515720Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6516178Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6516633Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6517131Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6517539Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6517947Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6518615Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6519060Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6519568Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6520247Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6520782Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6521120Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6521641Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6522160Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6522715Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6523336Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6523746Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6524197Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6524602Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6525141Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6525612Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6526134Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6527391Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6527850Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6528310Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6528738Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6529190Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6529594Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6530272Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6530730Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6531160Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6531551Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6532027Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6532415Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6532834Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6533295Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6533711Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6534155Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6534655Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6535150Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6535629Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6536100Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6536562Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6537052Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6537441Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6537931Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6538428Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6538934Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6539490Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6539965Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.6540264Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6542324Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6542781Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6543718Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6544251Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6545006Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6545589Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6546341Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6547010Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6547529Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6548509Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6548815Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6549575Z E1204 10:27:35.153000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6549695Z ('RERUN', {'yellow': True}) [1.8138s] [ 0%] 2025-12-04T10:35:20.6550857Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6551833Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6552204Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6552583Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6553012Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6553467Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6553933Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6554430Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6554933Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6555399Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6555854Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6556235Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6556740Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6557246Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6557756Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6558249Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6558705Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6559152Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6559571Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6560020Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6560424Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6561085Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6561533Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6562076Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6562687Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6563198Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6563532Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6564053Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6564594Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6565143Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6565754Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6566208Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6566618Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6567055Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6567592Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6568046Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6568510Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6569004Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6569452Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6569903Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6570322Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6570726Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6571171Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6571835Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6572284Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6572708Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6573094Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6573564Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6573945Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6574366Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6574828Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6575251Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6575743Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6576243Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6576743Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6577220Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6577642Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6578040Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6578603Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6578996Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6579535Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6579995Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6580500Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6580986Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6581458Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.6581760Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6583862Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6584324Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6585261Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6585795Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6586558Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6587148Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6587937Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6588606Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6589125Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6590065Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6590418Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6591183Z E1204 10:27:35.529000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6591305Z ('RERUN', {'yellow': True}) [0.3427s] [ 0%] 2025-12-04T10:35:20.6592468Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6593403Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6593783Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6594179Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6594562Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6595063Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6595531Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6596068Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6596576Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6597086Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6597467Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6597834Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6598342Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6598850Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6599401Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6599896Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6600350Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6600801Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6601215Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6601636Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6602091Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6602765Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6603220Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6603736Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6604352Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6604890Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6605233Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6605798Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6606360Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6606923Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6607537Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6608237Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6608679Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6609175Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6609725Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6610189Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6610660Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6611171Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6611685Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6612135Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6612577Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6612987Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6613399Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6614120Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6614593Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6615030Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6615431Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6615868Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6616268Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6616701Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6617161Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6617584Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6618049Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6618607Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6619163Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6619647Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6620073Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6620537Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6621032Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6621433Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6621920Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6622380Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6622974Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6623473Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6623961Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.6624273Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6626401Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6626876Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6627788Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6628329Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6629109Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6629696Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6630502Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6631171Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6631697Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6632645Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6632999Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6633782Z E1204 10:27:35.873000 87468 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6633872Z FAILED [0.3429s] [ 0%] 2025-12-04T10:35:20.6633878Z 2025-12-04T10:35:20.6634001Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.6634366Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.6634514Z Traceback (most recent call last): 2025-12-04T10:35:20.6634878Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6635090Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6635512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6635768Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6636213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6636375Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6636817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6636941Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6637445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6637720Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6638172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6638301Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6638710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6638813Z return self._compile_to_module() 2025-12-04T10:35:20.6639226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6639364Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6639818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6639930Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6640350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6640550Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6641053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6641206Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6641646Z File "/tmp/tmpda4qg5z6/dc/cdclxrasc7tnmn2qdxjuzbb62bszhhlc4uedhzudqv2wqb7b3uhc.py", line 65, in 2025-12-04T10:35:20.6642043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6642144Z kernel.precompile( 2025-12-04T10:35:20.6642620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6642723Z self._precompile_worker() 2025-12-04T10:35:20.6643283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6643433Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6643948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6644117Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6644501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6644722Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6645099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6645447Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6645644Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6646204Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6646286Z ^ 2025-12-04T10:35:20.6646685Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6646690Z 2025-12-04T10:35:20.6647308Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6647313Z 2025-12-04T10:35:20.6647317Z 2025-12-04T10:35:20.6647502Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6648301Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6648315Z 2025-12-04T10:35:20.6648546Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6648734Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6648828Z frames [('total', 1)] 2025-12-04T10:35:20.6648929Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6649346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6649548Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6649633Z graph_break [] 2025-12-04T10:35:20.6649993Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.6650103Z Traceback (most recent call last): 2025-12-04T10:35:20.6650473Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6650683Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6651105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6651315Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6651807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6651970Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6652419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6652543Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6653011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6653304Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6653790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6653922Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6654334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6654434Z return self._compile_to_module() 2025-12-04T10:35:20.6654855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6654991Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6655436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6655599Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6656068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6656275Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6656783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6656897Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6657349Z File "/tmp/tmpvn380q15/qm/cqm2xrqur3j5xpu5vzdagyhepg23xyhngtfgcteeihnfpwyyneq4.py", line 65, in 2025-12-04T10:35:20.6657756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6657862Z kernel.precompile( 2025-12-04T10:35:20.6658380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6658483Z self._precompile_worker() 2025-12-04T10:35:20.6659009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6659244Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6659756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6659938Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6660320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6660545Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6660919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6661216Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6661431Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6661993Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6662069Z ^ 2025-12-04T10:35:20.6662507Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6662512Z 2025-12-04T10:35:20.6663121Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6663126Z 2025-12-04T10:35:20.6663136Z 2025-12-04T10:35:20.6663318Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6664080Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6664150Z 2025-12-04T10:35:20.6664381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6664561Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6664646Z frames [('total', 1)] 2025-12-04T10:35:20.6664749Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6665150Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6665344Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6665422Z graph_break [] 2025-12-04T10:35:20.6665598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6665731Z frames [('total', 1)] 2025-12-04T10:35:20.6665825Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6666007Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6666411Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6666492Z graph_break [] 2025-12-04T10:35:20.6666617Z =================================== FAILURES =================================== 2025-12-04T10:35:20.6666967Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.6667065Z Traceback (most recent call last): 2025-12-04T10:35:20.6667434Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6667629Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6668041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6668301Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6668742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6668908Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6669340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6669462Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6669919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6670195Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6670651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6670781Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6671192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6671302Z return self._compile_to_module() 2025-12-04T10:35:20.6671715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6671863Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6672346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6672456Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6672885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6673079Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6673588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6673742Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6674184Z File "/tmp/tmp5gxiw3nv/zm/czmvsiylcrotx6rygblxanr2ss6pekd4iffo5u36pmlqw4sfu34f.py", line 65, in 2025-12-04T10:35:20.6674592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6674687Z kernel.precompile( 2025-12-04T10:35:20.6675165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6675271Z self._precompile_worker() 2025-12-04T10:35:20.6675782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6675978Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6676491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6676661Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6677054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6677265Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6677649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6677940Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6678137Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6678748Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6678823Z ^ 2025-12-04T10:35:20.6679220Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6679227Z 2025-12-04T10:35:20.6679842Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6679846Z 2025-12-04T10:35:20.6679850Z 2025-12-04T10:35:20.6680032Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6680797Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6680803Z 2025-12-04T10:35:20.6681034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6681228Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6681315Z frames [('total', 1)] 2025-12-04T10:35:20.6681412Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6681831Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6682015Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6682098Z graph_break [] 2025-12-04T10:35:20.6682332Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6682420Z frames [('total', 1)] 2025-12-04T10:35:20.6682512Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6682710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6683107Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6683193Z graph_break [] 2025-12-04T10:35:20.6683376Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6683461Z frames [('total', 1)] 2025-12-04T10:35:20.6683607Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6683791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6684188Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6684273Z graph_break [] 2025-12-04T10:35:20.6684837Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.xml - 2025-12-04T10:35:20.6684988Z =========================== short test summary info ============================ 2025-12-04T10:35:20.6685721Z FAILED [0.3429s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6686320Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6686401Z ^ 2025-12-04T10:35:20.6686801Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6686806Z 2025-12-04T10:35:20.6687427Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6687432Z 2025-12-04T10:35:20.6687436Z 2025-12-04T10:35:20.6687625Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6688393Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6688401Z 2025-12-04T10:35:20.6688675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6688832Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.6689021Z ================== 1 failed, 36 deselected, 2 rerun in 2.53s =================== 2025-12-04T10:35:20.6689110Z Got exit code 1 2025-12-04T10:35:20.6689210Z Retrying single test... 2025-12-04T10:35:20.6689625Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.xml 2025-12-04T10:35:20.6689765Z ============================= test session starts ============================== 2025-12-04T10:35:20.6690075Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.6690168Z cachedir: .pytest_cache 2025-12-04T10:35:20.6690622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.6690746Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.6690837Z configfile: pytest.ini 2025-12-04T10:35:20.6691322Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.6691515Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.6692327Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6692440Z Running 1 items in this shard 2025-12-04T10:35:20.6692445Z 2025-12-04T10:35:20.6693618Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6694575Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6694994Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6695376Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6695788Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6696248Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6696723Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6697268Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6697781Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6698255Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6698647Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6699071Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6699583Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6700136Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6700659Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6701155Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6701624Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6702075Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6702515Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6702937Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6703339Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6704088Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6704542Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6705054Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6705677Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6706206Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6706589Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6707120Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6707632Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6708495Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6709213Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6709623Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6710025Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6710438Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6710981Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6711452Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6711983Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6712485Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6712950Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6713401Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6713843Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6714262Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6714674Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6715350Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6715823Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6716363Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6716754Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6717199Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6717605Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6718031Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6718568Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6718999Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6719457Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6719965Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6720470Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6721005Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6721428Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6721833Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6722323Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6722722Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6723262Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6723732Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6724253Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6724753Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6725241Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.6725553Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6727623Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6728100Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6728996Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6729547Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6730351Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6730950Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6731707Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6732386Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6732954Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6733918Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6734240Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6735016Z E1204 10:27:45.851000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6735140Z ('RERUN', {'yellow': True}) [1.8093s] [100%] 2025-12-04T10:35:20.6736357Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6737310Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6737684Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6738080Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6738485Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6738947Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6739495Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6739991Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6740548Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6741023Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6741406Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6741788Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6742338Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6742854Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6743375Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6743876Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6744346Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6744843Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6745274Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6745687Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6746141Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6746814Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6747259Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6747842Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6748469Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6748994Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6749343Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6756575Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6757159Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6757816Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6758458Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6758940Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6759345Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6759752Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6760296Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6760800Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6761269Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6761763Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6762219Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6762665Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6763137Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6763543Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6763951Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6764622Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6765078Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6765509Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6765942Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6766375Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6766767Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6767196Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6767659Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6768082Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6768541Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6769053Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6769546Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6770080Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6770500Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6770900Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6771390Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6771775Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6772314Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6772774Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6773290Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6773783Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6774256Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.6774605Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6776628Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6777129Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6778033Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6778572Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6779403Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6779990Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6780747Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6781409Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6781973Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6782912Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6783226Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6783995Z E1204 10:27:46.228000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6784152Z ('RERUN', {'yellow': True}) [0.3439s] [100%] 2025-12-04T10:35:20.6785324Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6786266Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6786644Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6787064Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6787457Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6787917Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6788383Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6788877Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6789381Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6789902Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6790287Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6790659Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6791168Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6791668Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6792182Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6792681Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6793134Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6793580Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6794045Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6794449Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6794842Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6795509Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6795996Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6796500Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6797111Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6797631Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6797972Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6798561Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6799064Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6799617Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6800222Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6800629Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6801075Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6801480Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6802024Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6802484Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6802946Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6803434Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6803897Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6804353Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6804779Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6805232Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6805633Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6806351Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6806804Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6807301Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6807692Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6808455Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6808849Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6809268Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6809733Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6810243Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6810696Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6811201Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6811693Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6812175Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6812593Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6813049Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6813542Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6813925Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6814423Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6814878Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6815389Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6815887Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6816360Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.6816660Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6818737Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6819305Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6820200Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6820740Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6821500Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6822125Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6822876Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6823602Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6824182Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6825180Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6825494Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6826258Z E1204 10:27:46.571000 87649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6826347Z FAILED [0.3421s] [100%] 2025-12-04T10:35:20.6826353Z 2025-12-04T10:35:20.6826471Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.6826822Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.6826924Z Traceback (most recent call last): 2025-12-04T10:35:20.6827283Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6827494Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6827909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6828123Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6828568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6828774Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6829214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6829335Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6829790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6830071Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6830516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6830687Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6831093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6831196Z return self._compile_to_module() 2025-12-04T10:35:20.6831612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6831746Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6832183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6832297Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6832763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6832968Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6833471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6833575Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6834018Z File "/tmp/tmp0waadcb3/lt/cltt5eksho3vm3dp6rgm62r2zcrl2k3djay2ye2ud5knou7ih2ln.py", line 65, in 2025-12-04T10:35:20.6834417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6834516Z kernel.precompile( 2025-12-04T10:35:20.6834989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6835087Z self._precompile_worker() 2025-12-04T10:35:20.6835643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6835796Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6836306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6836476Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6836862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6837070Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6837443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6837722Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6837925Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6838478Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6838557Z ^ 2025-12-04T10:35:20.6839073Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6839078Z 2025-12-04T10:35:20.6839738Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6839750Z 2025-12-04T10:35:20.6839754Z 2025-12-04T10:35:20.6839938Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6840698Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6840705Z 2025-12-04T10:35:20.6840934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6841183Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6841274Z frames [('total', 1)] 2025-12-04T10:35:20.6841372Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6841782Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6841982Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6842065Z graph_break [] 2025-12-04T10:35:20.6842410Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.6842520Z Traceback (most recent call last): 2025-12-04T10:35:20.6842885Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6843129Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6843552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6843766Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6844210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6844380Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6844817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6844946Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6845401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6845727Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6846175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6846301Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6846719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6846818Z return self._compile_to_module() 2025-12-04T10:35:20.6847239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6847374Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6847816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6847928Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6848351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6848544Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6849056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6849163Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6849678Z File "/tmp/tmpuzx4q8fc/qg/cqgu4noh5scrxf2qf3gkpdw36bu5clypbzntfxxutzenjieujlo2.py", line 65, in 2025-12-04T10:35:20.6850074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6850163Z kernel.precompile( 2025-12-04T10:35:20.6850638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6850738Z self._precompile_worker() 2025-12-04T10:35:20.6851255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6851447Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6851953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6852129Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6852513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6852720Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6853100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6853381Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6853623Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6854178Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6854250Z ^ 2025-12-04T10:35:20.6854644Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6854649Z 2025-12-04T10:35:20.6855255Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6855260Z 2025-12-04T10:35:20.6855264Z 2025-12-04T10:35:20.6855453Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6856295Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6856303Z 2025-12-04T10:35:20.6856535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6856721Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6856805Z frames [('total', 1)] 2025-12-04T10:35:20.6856909Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6857314Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6857498Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6857583Z graph_break [] 2025-12-04T10:35:20.6857759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6857851Z frames [('total', 1)] 2025-12-04T10:35:20.6857943Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6858131Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6858545Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6858625Z graph_break [] 2025-12-04T10:35:20.6858747Z =================================== FAILURES =================================== 2025-12-04T10:35:20.6859158Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.6859257Z Traceback (most recent call last): 2025-12-04T10:35:20.6859671Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.6859868Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.6860282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.6860503Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.6860951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.6861153Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.6861597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.6861714Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.6862174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.6862447Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.6862888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.6863013Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.6863467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.6863569Z return self._compile_to_module() 2025-12-04T10:35:20.6863988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.6864126Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.6864577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.6864682Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.6865101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.6865299Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.6865846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.6866005Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.6866434Z File "/tmp/tmpto3tdc64/hc/chcj4h7nlexnlwy5u3m3zrqjy52nrim6jdsq5kw67oriq3by3id7.py", line 65, in 2025-12-04T10:35:20.6866829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.6866925Z kernel.precompile( 2025-12-04T10:35:20.6867399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.6867499Z self._precompile_worker() 2025-12-04T10:35:20.6868007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.6868161Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.6868681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6868849Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6869234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6869451Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6869826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6870158Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6870354Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6870913Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6870993Z ^ 2025-12-04T10:35:20.6871390Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6871395Z 2025-12-04T10:35:20.6872013Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6872061Z 2025-12-04T10:35:20.6872065Z 2025-12-04T10:35:20.6872245Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6873014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6873019Z 2025-12-04T10:35:20.6873246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6873426Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6873558Z frames [('total', 1)] 2025-12-04T10:35:20.6873659Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6874059Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6874261Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6874340Z graph_break [] 2025-12-04T10:35:20.6874526Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6874609Z frames [('total', 1)] 2025-12-04T10:35:20.6874709Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6874895Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6875293Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6875371Z graph_break [] 2025-12-04T10:35:20.6875555Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.6875640Z frames [('total', 1)] 2025-12-04T10:35:20.6875869Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.6876052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.6876445Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.6876531Z graph_break [] 2025-12-04T10:35:20.6877093Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.xml - 2025-12-04T10:35:20.6877243Z =========================== short test summary info ============================ 2025-12-04T10:35:20.6877986Z FAILED [0.3421s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.6878541Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6878619Z ^ 2025-12-04T10:35:20.6879014Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6879021Z 2025-12-04T10:35:20.6879636Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.6879641Z 2025-12-04T10:35:20.6879645Z 2025-12-04T10:35:20.6879893Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.6880646Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6880651Z 2025-12-04T10:35:20.6880884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.6881039Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.6881212Z ================== 1 failed, 187 deselected, 2 rerun in 2.53s ================== 2025-12-04T10:35:20.6881338Z Got exit code 1 2025-12-04T10:35:20.6881426Z Retrying single test... 2025-12-04T10:35:20.6881831Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.xml 2025-12-04T10:35:20.6881966Z ============================= test session starts ============================== 2025-12-04T10:35:20.6882264Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.6882363Z cachedir: .pytest_cache 2025-12-04T10:35:20.6882808Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.6882911Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.6883044Z configfile: pytest.ini 2025-12-04T10:35:20.6883506Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.6883704Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.6884396Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.6884495Z Running 1 items in this shard 2025-12-04T10:35:20.6884500Z 2025-12-04T10:35:20.6885668Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6886667Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6887054Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6887436Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6887829Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6888281Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6888750Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6889248Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6889745Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6890228Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6890650Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6891020Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6891526Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6892020Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6892538Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6893069Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6893525Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6893973Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6894384Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6894797Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6895231Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6895899Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6896349Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6896856Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6897472Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6898025Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6898370Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6898887Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6899455Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6900003Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6900603Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6901017Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6901422Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6901830Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6902420Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6902872Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6903336Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6903832Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6904337Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6904781Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6905206Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6905618Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6906014Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6906727Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6907176Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6907602Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6908324Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6908778Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6909171Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6909681Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6910149Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6910567Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6911013Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6911523Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6912012Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6912500Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6912923Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6913311Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6913869Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6914256Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6914750Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6915215Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6915725Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6916272Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6916735Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.6917044Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6919058Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6919587Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6920476Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6921088Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6921848Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6922429Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6923179Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6923836Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6924366Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6925297Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6925609Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6926415Z E1204 10:27:56.449000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6926535Z ('RERUN', {'yellow': True}) [1.7976s] [100%] 2025-12-04T10:35:20.6927697Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6928675Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6929047Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6929423Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6929816Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6930268Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6930856Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6931351Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6931846Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6932320Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6932695Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6933062Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6933612Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6934110Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6934625Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6935116Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6935571Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6936023Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6936436Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6936850Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6937242Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6937946Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6938389Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6938889Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6939547Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6940108Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6940451Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6940967Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6941469Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6942058Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6942658Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6943065Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6943466Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6943869Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6944403Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6944897Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6945366Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6945853Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6946308Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6946751Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6947168Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6947574Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6947967Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6948637Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6949124Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6949559Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6949944Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6950373Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6950802Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6951219Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6951688Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6952110Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6952551Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6953059Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6953590Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6954075Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6954497Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6954893Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6955378Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6955807Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6956307Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6956763Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6957272Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6957763Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6958222Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.6958534Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.6960580Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.6961044Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.6961938Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.6962526Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.6963284Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.6963869Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.6964620Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.6965350Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.6965914Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.6966864Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6967173Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.6967979Z E1204 10:27:56.821000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.6968096Z ('RERUN', {'yellow': True}) [0.3398s] [100%] 2025-12-04T10:35:20.6969255Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T10:35:20.6970188Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.6970565Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T10:35:20.6970945Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T10:35:20.6971336Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T10:35:20.6971794Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T10:35:20.6972253Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.6972790Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T10:35:20.6973283Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T10:35:20.6973760Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T10:35:20.6974142Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T10:35:20.6974557Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.6975065Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6975565Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6976133Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T10:35:20.6976619Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6977113Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6977570Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6977992Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6978399Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6978789Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6979563Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6980011Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.6980518Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6981132Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T10:35:20.6981645Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T10:35:20.6981988Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T10:35:20.6982512Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T10:35:20.6983017Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T10:35:20.6983566Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T10:35:20.6984211Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T10:35:20.6984621Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T10:35:20.6985022Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T10:35:20.6985434Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T10:35:20.6986011Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T10:35:20.6986463Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T10:35:20.6986930Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T10:35:20.6987419Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T10:35:20.6987877Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T10:35:20.6988371Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T10:35:20.6988791Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T10:35:20.6989197Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T10:35:20.6989589Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T10:35:20.6990254Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T10:35:20.6990699Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T10:35:20.6991166Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T10:35:20.6991555Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T10:35:20.6991980Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T10:35:20.6992376Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T10:35:20.6992792Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T10:35:20.6993252Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T10:35:20.6993672Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T10:35:20.6994114Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T10:35:20.6994624Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T10:35:20.6995158Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T10:35:20.6995635Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T10:35:20.6996059Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T10:35:20.6996463Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T10:35:20.6996954Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T10:35:20.6997384Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T10:35:20.6997879Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T10:35:20.6998345Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T10:35:20.6998858Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T10:35:20.6999346Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T10:35:20.6999848Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T10:35:20.7000156Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.7002201Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7002665Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.7003560Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7004109Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7004863Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7005447Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7006251Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7006954Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7007472Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7008652Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.7008970Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.7009839Z E1204 10:27:57.162000 87830 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7009929Z FAILED [0.3395s] [100%] 2025-12-04T10:35:20.7009934Z 2025-12-04T10:35:20.7010054Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7010401Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7010513Z Traceback (most recent call last): 2025-12-04T10:35:20.7010871Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.7011072Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.7011550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7011762Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7012203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7012369Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7012813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7012931Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7013391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7013669Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7014168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7014291Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7014706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7014803Z return self._compile_to_module() 2025-12-04T10:35:20.7015220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7015357Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7015800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7015911Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7016333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7016541Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7017038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7017143Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7017581Z File "/tmp/tmpdu0qqvj8/bp/cbpkoaotyt6w3t6nhfvncbru7hq5du56hssi4mo7kfhvs2wz4oly.py", line 65, in 2025-12-04T10:35:20.7018037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7018127Z kernel.precompile( 2025-12-04T10:35:20.7018604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7018697Z self._precompile_worker() 2025-12-04T10:35:20.7019255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7019406Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7019910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7020122Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7020501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7020712Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7021085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7021367Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7021563Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7022163Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.7022236Z ^ 2025-12-04T10:35:20.7022637Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7022642Z 2025-12-04T10:35:20.7023247Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7023253Z 2025-12-04T10:35:20.7023257Z 2025-12-04T10:35:20.7023446Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7024202Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7024209Z 2025-12-04T10:35:20.7024479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7024659Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7024746Z frames [('total', 1)] 2025-12-04T10:35:20.7024841Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.7025245Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7025435Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.7025515Z graph_break [] 2025-12-04T10:35:20.7025861Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7025966Z Traceback (most recent call last): 2025-12-04T10:35:20.7026321Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.7026512Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.7026931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7027141Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7027584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7027748Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7028224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7028353Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7028815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7029084Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7029540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7029708Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7030122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7030223Z return self._compile_to_module() 2025-12-04T10:35:20.7030639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7030777Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7031218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7031330Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7031749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7031985Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7032495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7032603Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7033019Z File "/tmp/tmp253oyr_d/sd/csdy5hvu45hpw625y3fiiuwr7p4dczxtmhsvf47xxu3eiw4tjv7f.py", line 65, in 2025-12-04T10:35:20.7033423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7033516Z kernel.precompile( 2025-12-04T10:35:20.7034001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7034096Z self._precompile_worker() 2025-12-04T10:35:20.7034650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7034805Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7035315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7035489Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7035897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7036129Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7036509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7036801Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7036997Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7037554Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.7037628Z ^ 2025-12-04T10:35:20.7038023Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7038028Z 2025-12-04T10:35:20.7038680Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7038685Z 2025-12-04T10:35:20.7038689Z 2025-12-04T10:35:20.7038879Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7039635Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7039642Z 2025-12-04T10:35:20.7039872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7040057Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7040178Z frames [('total', 1)] 2025-12-04T10:35:20.7040277Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.7040682Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7040871Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.7040960Z graph_break [] 2025-12-04T10:35:20.7041138Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7041220Z frames [('total', 1)] 2025-12-04T10:35:20.7041323Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.7041505Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.7041902Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7042025Z graph_break [] 2025-12-04T10:35:20.7042142Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7042495Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7042597Z Traceback (most recent call last): 2025-12-04T10:35:20.7042955Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T10:35:20.7043158Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T10:35:20.7043572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7043788Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7044220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7044429Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7044875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7045000Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7045458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7045737Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7046236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7046365Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7046770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7046877Z return self._compile_to_module() 2025-12-04T10:35:20.7047293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7047427Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7047871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7047973Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7048463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7048660Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7049159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7049265Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7049711Z File "/tmp/tmp7x5nkcmi/ri/cribppqv3iczsynsh4fmdqllfgmzb7uflwk3zo7z6svfapfmas3g.py", line 65, in 2025-12-04T10:35:20.7050106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7050243Z kernel.precompile( 2025-12-04T10:35:20.7050719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7050820Z self._precompile_worker() 2025-12-04T10:35:20.7051333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7051482Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7051988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7052148Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7052656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7052866Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7053244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7053533Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7053726Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7054280Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.7054359Z ^ 2025-12-04T10:35:20.7054750Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7054758Z 2025-12-04T10:35:20.7055407Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7055415Z 2025-12-04T10:35:20.7055419Z 2025-12-04T10:35:20.7055598Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7056353Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7056362Z 2025-12-04T10:35:20.7056583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7056761Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7056851Z frames [('total', 1)] 2025-12-04T10:35:20.7056945Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.7057346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7057533Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.7061401Z graph_break [] 2025-12-04T10:35:20.7061608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7061704Z frames [('total', 1)] 2025-12-04T10:35:20.7061801Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.7061997Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.7062472Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7062553Z graph_break [] 2025-12-04T10:35:20.7062741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7062830Z frames [('total', 1)] 2025-12-04T10:35:20.7062929Z stats [('calls_captured', 10)] 2025-12-04T10:35:20.7063131Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.7063524Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7063653Z graph_break [] 2025-12-04T10:35:20.7064215Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.xml - 2025-12-04T10:35:20.7064360Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7065115Z FAILED [0.3395s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7065669Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T10:35:20.7065791Z ^ 2025-12-04T10:35:20.7066186Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7066192Z 2025-12-04T10:35:20.7066802Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7066807Z 2025-12-04T10:35:20.7066817Z 2025-12-04T10:35:20.7067001Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7067761Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7067766Z 2025-12-04T10:35:20.7067996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7068153Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7068366Z ================== 1 failed, 187 deselected, 2 rerun in 2.51s ================== 2025-12-04T10:35:20.7068458Z Got exit code 1 2025-12-04T10:35:20.7069003Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7069364Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.7069772Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.xml 2025-12-04T10:35:20.7069909Z ============================= test session starts ============================== 2025-12-04T10:35:20.7070213Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7070306Z cachedir: .pytest_cache 2025-12-04T10:35:20.7070760Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7070868Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7070958Z configfile: pytest.ini 2025-12-04T10:35:20.7071425Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7071620Z collecting ... collected 188 items / 37 deselected / 151 selected 2025-12-04T10:35:20.7071742Z stepcurrent: skipping 37 already run items. 2025-12-04T10:35:20.7071849Z Running 151 items in this shard 2025-12-04T10:35:20.7071854Z 2025-12-04T10:35:20.7072385Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda PASSED [1.9819s] [ 0%] 2025-12-04T10:35:20.7072882Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.5899s] [ 1%] 2025-12-04T10:35:20.7073369Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.7384s] [ 1%] 2025-12-04T10:35:20.7073862Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.7400s] [ 2%] 2025-12-04T10:35:20.7074404Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.9962s] [ 3%] 2025-12-04T10:35:20.7074879Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.6069s] [ 3%] 2025-12-04T10:35:20.7075363Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.6624s] [ 4%] 2025-12-04T10:35:20.7075847Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [0.9381s] [ 5%] 2025-12-04T10:35:20.7076325Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.6684s] [ 5%] 2025-12-04T10:35:20.7076863Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [1.0237s] [ 6%] 2025-12-04T10:35:20.7077375Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [1.0386s] [ 7%] 2025-12-04T10:35:20.7077891Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [0.9288s] [ 7%] 2025-12-04T10:35:20.7078332Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda FAILED [0.8715s] [ 7%] 2025-12-04T10:35:20.7078337Z 2025-12-04T10:35:20.7078461Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7078732Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7078840Z Traceback (most recent call last): 2025-12-04T10:35:20.7079227Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7079353Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7079777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7079995Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7080434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7080606Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7081039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7081156Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7081622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7081895Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7082346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7082473Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7082925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7083037Z return self._compile_to_module() 2025-12-04T10:35:20.7083447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7083583Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7084030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7084145Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7084576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7084810Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7085307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7085418Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7085863Z File "/tmp/tmp9avqyx1k/n7/cn7cjwsdmcygywdycdpfllorkspoj6wasj2mpbw3p5frzx6xcdqh.py", line 84, in 2025-12-04T10:35:20.7086300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.7086399Z self._wait_futures(scope) 2025-12-04T10:35:20.7086819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.7086966Z kernel = result.result() 2025-12-04T10:35:20.7087339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.7087436Z return self.result_fn() 2025-12-04T10:35:20.7087847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.7087956Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.7088303Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.7088308Z 2025-12-04T10:35:20.7088415Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7088515Z Traceback (most recent call last): 2025-12-04T10:35:20.7088982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.7089068Z result = job() 2025-12-04T10:35:20.7089647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.7089766Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.7090243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.7090350Z self._precompile_worker() 2025-12-04T10:35:20.7090861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7091012Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7091523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7091695Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7092085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7092296Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7092679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7092975Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7093131Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7093441Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7093516Z ^ 2025-12-04T10:35:20.7093907Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7093912Z 2025-12-04T10:35:20.7093916Z 2025-12-04T10:35:20.7094539Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7094549Z 2025-12-04T10:35:20.7094553Z 2025-12-04T10:35:20.7094733Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7095470Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7095475Z 2025-12-04T10:35:20.7095702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7095912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7096014Z frames [('total', 1)] 2025-12-04T10:35:20.7096129Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7096322Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7096827Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7096950Z graph_break [] 2025-12-04T10:35:20.7097231Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7097332Z Traceback (most recent call last): 2025-12-04T10:35:20.7097677Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7097806Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7098222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7098442Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7098883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7099102Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7099599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7099721Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7100188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7100468Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7100913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7101043Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7101452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7101553Z return self._compile_to_module() 2025-12-04T10:35:20.7101972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7102115Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7102561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7102675Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7103089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7103336Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7103839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7103957Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7104400Z File "/tmp/tmpe2ov3yrl/sz/cszzy7yacw2o5jetxjtv3zrfddaibkyxxxvpfobhtqjhc5ahhbv2.py", line 84, in 2025-12-04T10:35:20.7104784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.7104887Z self._wait_futures(scope) 2025-12-04T10:35:20.7105306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.7105450Z kernel = result.result() 2025-12-04T10:35:20.7105883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.7105978Z return self.result_fn() 2025-12-04T10:35:20.7106393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.7106504Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.7106836Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.7106841Z 2025-12-04T10:35:20.7106960Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7107103Z Traceback (most recent call last): 2025-12-04T10:35:20.7107568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.7107653Z result = job() 2025-12-04T10:35:20.7108318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.7108441Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.7108913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.7109012Z self._precompile_worker() 2025-12-04T10:35:20.7109523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7109675Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7110263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7110433Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7110816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7111029Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7111406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7111695Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7111860Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7112119Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7112196Z ^ 2025-12-04T10:35:20.7112585Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7112594Z 2025-12-04T10:35:20.7112598Z 2025-12-04T10:35:20.7113219Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7113226Z 2025-12-04T10:35:20.7113229Z 2025-12-04T10:35:20.7113415Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7114176Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7114181Z 2025-12-04T10:35:20.7114426Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7114611Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7114707Z frames [('total', 1)] 2025-12-04T10:35:20.7114809Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7115002Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7115517Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7115669Z graph_break [] 2025-12-04T10:35:20.7115853Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7115951Z frames [('total', 1)] 2025-12-04T10:35:20.7116048Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7116253Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7116755Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7116837Z graph_break [] 2025-12-04T10:35:20.7116970Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7117311Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7117416Z Traceback (most recent call last): 2025-12-04T10:35:20.7117788Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7117914Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7118347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7118570Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7119014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7119187Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7119630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7119809Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7120275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7120559Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7121014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7121147Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7121565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7121679Z return self._compile_to_module() 2025-12-04T10:35:20.7122099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7122245Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7122692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7122807Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7123249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7123449Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7124012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7124119Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7124559Z File "/tmp/tmp5xc3wj4l/yl/cylb2kn5kngs6ygqehp4cszn7o7dv4palhjl66g5zmdghdtn57w2.py", line 84, in 2025-12-04T10:35:20.7124954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.7125056Z self._wait_futures(scope) 2025-12-04T10:35:20.7125490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.7125649Z kernel = result.result() 2025-12-04T10:35:20.7126078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.7126186Z return self.result_fn() 2025-12-04T10:35:20.7126605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.7126715Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.7127057Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.7127062Z 2025-12-04T10:35:20.7127173Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7127280Z Traceback (most recent call last): 2025-12-04T10:35:20.7127742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.7127891Z result = job() 2025-12-04T10:35:20.7128401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.7128521Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.7128997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.7129106Z self._precompile_worker() 2025-12-04T10:35:20.7129613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7129771Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7130280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7130496Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7130897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7131109Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7131506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7131800Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7131961Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7132234Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7132308Z ^ 2025-12-04T10:35:20.7132708Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7132712Z 2025-12-04T10:35:20.7132718Z 2025-12-04T10:35:20.7133338Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7133346Z 2025-12-04T10:35:20.7133350Z 2025-12-04T10:35:20.7133530Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7134272Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7134278Z 2025-12-04T10:35:20.7134507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7134702Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7134786Z frames [('total', 1)] 2025-12-04T10:35:20.7134881Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7135085Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7135596Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7135719Z graph_break [] 2025-12-04T10:35:20.7135907Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7135996Z frames [('total', 1)] 2025-12-04T10:35:20.7136096Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7136288Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7136793Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7136883Z graph_break [] 2025-12-04T10:35:20.7137062Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7137151Z frames [('total', 1)] 2025-12-04T10:35:20.7137252Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7137485Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7137987Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7138075Z graph_break [] 2025-12-04T10:35:20.7138633Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.xml - 2025-12-04T10:35:20.7138787Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7139638Z FAILED [0.8715s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.7139644Z 2025-12-04T10:35:20.7139757Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7139861Z Traceback (most recent call last): 2025-12-04T10:35:20.7140378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.7140476Z result = job() 2025-12-04T10:35:20.7140983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.7141104Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.7141588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.7141684Z self._precompile_worker() 2025-12-04T10:35:20.7142204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7142356Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7142872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7143052Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7143436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7143650Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7144033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7144362Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7144526Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7144787Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7144856Z ^ 2025-12-04T10:35:20.7145254Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7145262Z 2025-12-04T10:35:20.7145268Z 2025-12-04T10:35:20.7145879Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7145924Z 2025-12-04T10:35:20.7145928Z 2025-12-04T10:35:20.7146118Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7146822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7146827Z 2025-12-04T10:35:20.7147060Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7147257Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7147510Z ============ 1 failed, 10 passed, 37 deselected, 2 rerun in 11.83s ============= 2025-12-04T10:35:20.7147686Z Got exit code 1 2025-12-04T10:35:20.7147787Z Retrying single test... 2025-12-04T10:35:20.7148200Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.xml 2025-12-04T10:35:20.7148339Z ============================= test session starts ============================== 2025-12-04T10:35:20.7148639Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7148743Z cachedir: .pytest_cache 2025-12-04T10:35:20.7149195Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7149296Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7149396Z configfile: pytest.ini 2025-12-04T10:35:20.7149854Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7150053Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.7150715Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7150814Z Running 1 items in this shard 2025-12-04T10:35:20.7150819Z 2025-12-04T10:35:20.7151813Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7152456Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7152934Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7153418Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7153842Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7154219Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7154769Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7155212Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7155589Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7156078Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7156456Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7156981Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7157421Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7157869Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7158337Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7158643Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7160118Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7160590Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7161479Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7162061Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7162827Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7163418Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7164277Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7164943Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7165472Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7166165Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7166483Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7167293Z E1204 10:28:26.848000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7167413Z ('RERUN', {'yellow': True}) [2.2361s] [100%] 2025-12-04T10:35:20.7168385Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7169025Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7169561Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7170040Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7170469Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7170845Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7171412Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7171844Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7172223Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7172711Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7173082Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7173568Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7174043Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7174504Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7174979Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7175288Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7176729Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7177190Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7178088Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7178662Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7179474Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7180060Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7180815Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7181522Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7182048Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7182688Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7182995Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7183802Z E1204 10:28:27.479000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7183916Z ('RERUN', {'yellow': True}) [0.5971s] [100%] 2025-12-04T10:35:20.7184890Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7185535Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7186051Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7186579Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7187008Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7187374Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7187884Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7188316Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7188695Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7189182Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7189553Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7190034Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7190507Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7190965Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7191432Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7191734Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7193160Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7193662Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7194554Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7195094Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7195925Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7196529Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7197290Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7197947Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7198505Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7199149Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7199458Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7200224Z E1204 10:28:28.073000 88886 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7200306Z FAILED [0.5922s] [100%] 2025-12-04T10:35:20.7200311Z 2025-12-04T10:35:20.7200432Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7200705Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7200806Z Traceback (most recent call last): 2025-12-04T10:35:20.7201152Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7201271Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7201684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7201941Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7202376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7202540Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7202972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7203092Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7203549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7203863Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7204310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7204430Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7204836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7204939Z return self._compile_to_module() 2025-12-04T10:35:20.7205347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7205479Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7205967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7206073Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7206498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7206690Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7207189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7207299Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7207896Z File "/tmp/tmp5ypzm6cg/wx/cwxpqc56k7bujjofl7t3w4pan3irgnswi2rtqz5sc6zd5obkzjny.py", line 50, in 2025-12-04T10:35:20.7208298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7208393Z kernel.precompile( 2025-12-04T10:35:20.7208948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7209052Z self._precompile_worker() 2025-12-04T10:35:20.7209556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7209707Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7210215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7210378Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7210762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7210965Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7211340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7211632Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7211822Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7212084Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7212152Z ^ 2025-12-04T10:35:20.7212628Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7212634Z 2025-12-04T10:35:20.7213243Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7213248Z 2025-12-04T10:35:20.7213252Z 2025-12-04T10:35:20.7213431Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7214120Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7214184Z 2025-12-04T10:35:20.7214411Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7214589Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7214672Z frames [('total', 1)] 2025-12-04T10:35:20.7214765Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7215177Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7215363Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7215440Z graph_break [] 2025-12-04T10:35:20.7215713Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7215810Z Traceback (most recent call last): 2025-12-04T10:35:20.7216215Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7216338Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7216748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7216958Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7217401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7217560Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7217995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7218117Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7218608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7218887Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7219369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7219494Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7219897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7220001Z return self._compile_to_module() 2025-12-04T10:35:20.7220413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7220545Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7220983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7221093Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7221516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7221715Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7222210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7222315Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7222797Z File "/tmp/tmpgn8idbvh/4m/c4mi2hok7cfuhktrq6d33hzuiuewjdvalubwy5eqqbafwvdo2jxz.py", line 50, in 2025-12-04T10:35:20.7223191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7223289Z kernel.precompile( 2025-12-04T10:35:20.7223758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7223858Z self._precompile_worker() 2025-12-04T10:35:20.7224366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7224642Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7225145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7225315Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7225691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7225899Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7226275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7226606Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7226801Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7227059Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7227129Z ^ 2025-12-04T10:35:20.7227515Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7227520Z 2025-12-04T10:35:20.7228124Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7228133Z 2025-12-04T10:35:20.7228137Z 2025-12-04T10:35:20.7228315Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7228999Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7229048Z 2025-12-04T10:35:20.7229283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7229473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7229563Z frames [('total', 1)] 2025-12-04T10:35:20.7229654Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7230055Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7230250Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7230331Z graph_break [] 2025-12-04T10:35:20.7230505Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7230589Z frames [('total', 1)] 2025-12-04T10:35:20.7230677Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7230857Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7231265Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7231347Z graph_break [] 2025-12-04T10:35:20.7231473Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7231744Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7231844Z Traceback (most recent call last): 2025-12-04T10:35:20.7232239Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7232361Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7232774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7232981Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7233417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7233581Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7234058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7234173Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7234630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7234902Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7235349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7235468Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7235871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7236016Z return self._compile_to_module() 2025-12-04T10:35:20.7236428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7236572Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7237008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7237114Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7237543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7237735Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7238230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7238337Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7238808Z File "/tmp/tmplz2z23p5/rp/crpdk6ftmt6tdgl75i7yffvgnapth7536doixdmeu3ekc7d3fex3.py", line 50, in 2025-12-04T10:35:20.7239214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7239302Z kernel.precompile( 2025-12-04T10:35:20.7239773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7239877Z self._precompile_worker() 2025-12-04T10:35:20.7240382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7240532Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7241034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7241200Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7241579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7241784Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7242164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7242495Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7242686Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7242944Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7243012Z ^ 2025-12-04T10:35:20.7243398Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7243409Z 2025-12-04T10:35:20.7244019Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7244065Z 2025-12-04T10:35:20.7244068Z 2025-12-04T10:35:20.7244253Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7244934Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7244940Z 2025-12-04T10:35:20.7245163Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7245343Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7245423Z frames [('total', 1)] 2025-12-04T10:35:20.7245516Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7245922Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7246149Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7246227Z graph_break [] 2025-12-04T10:35:20.7246409Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7246493Z frames [('total', 1)] 2025-12-04T10:35:20.7246589Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7246773Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7247167Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7247247Z graph_break [] 2025-12-04T10:35:20.7247421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7247498Z frames [('total', 1)] 2025-12-04T10:35:20.7247592Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7247772Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7248237Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7248319Z graph_break [] 2025-12-04T10:35:20.7248876Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.xml - 2025-12-04T10:35:20.7249024Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7249695Z FAILED [0.5922s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7249955Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7250028Z ^ 2025-12-04T10:35:20.7250415Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7250422Z 2025-12-04T10:35:20.7251035Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7251041Z 2025-12-04T10:35:20.7251045Z 2025-12-04T10:35:20.7251220Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7251948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7251953Z 2025-12-04T10:35:20.7252175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7252322Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7252493Z ================== 1 failed, 187 deselected, 2 rerun in 3.46s ================== 2025-12-04T10:35:20.7252571Z Got exit code 1 2025-12-04T10:35:20.7252659Z Retrying single test... 2025-12-04T10:35:20.7253070Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.xml 2025-12-04T10:35:20.7253246Z ============================= test session starts ============================== 2025-12-04T10:35:20.7253538Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7253628Z cachedir: .pytest_cache 2025-12-04T10:35:20.7254079Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7254188Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7254275Z configfile: pytest.ini 2025-12-04T10:35:20.7254733Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7254918Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.7255574Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7255670Z Running 1 items in this shard 2025-12-04T10:35:20.7255675Z 2025-12-04T10:35:20.7256701Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7257350Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7257813Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7258330Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7258758Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7259224Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7259734Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7260168Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7260543Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7261035Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7261410Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7261896Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7262327Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7262814Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7263283Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7263586Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7265022Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7265527Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7266471Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7267007Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7267816Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7268397Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7269148Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7269809Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7270370Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7271013Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7271323Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7272092Z E1204 10:28:37.623000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7272197Z ('RERUN', {'yellow': True}) [2.2297s] [100%] 2025-12-04T10:35:20.7273170Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7273818Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7274282Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7274801Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7275222Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7275661Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7276223Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7276656Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7277079Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7277560Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7277934Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7278415Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7278843Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7279340Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7279806Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7280115Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7281541Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7282036Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7282940Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7283484Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7284247Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7284822Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7285578Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7286290Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7286851Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7287486Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7287794Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7288571Z E1204 10:28:38.251000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7288719Z ('RERUN', {'yellow': True}) [0.5946s] [100%] 2025-12-04T10:35:20.7289698Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7290336Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7290801Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7291284Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7291769Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7292141Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7292650Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7293086Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7293462Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7293941Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7294361Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7294844Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7295280Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7295728Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7296193Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7296501Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7297930Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7298438Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7299378Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7299911Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7300671Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7301288Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7302049Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7302704Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7303270Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7303902Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7304215Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7304982Z E1204 10:28:38.845000 89068 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7305061Z FAILED [0.5923s] [100%] 2025-12-04T10:35:20.7305066Z 2025-12-04T10:35:20.7305184Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7305455Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7305604Z Traceback (most recent call last): 2025-12-04T10:35:20.7305948Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7306070Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7306485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7306692Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7307128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7307291Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7307721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7307986Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7308447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7308718Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7309163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7309283Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7309760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7309859Z return self._compile_to_module() 2025-12-04T10:35:20.7310267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7310402Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7310839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7310950Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7311366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7311617Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7312115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7312220Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7312639Z File "/tmp/tmpahwq0k6_/kr/ckrqtec7h2xh5cyp43uhor2apoc2btydbltiilfqv2mgcp3uc3ou.py", line 50, in 2025-12-04T10:35:20.7313031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7313117Z kernel.precompile( 2025-12-04T10:35:20.7313586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7313739Z self._precompile_worker() 2025-12-04T10:35:20.7314245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7314393Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7314897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7315061Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7315439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7315654Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7316062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7316406Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7316603Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7316860Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7316927Z ^ 2025-12-04T10:35:20.7317322Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7317326Z 2025-12-04T10:35:20.7317935Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7317940Z 2025-12-04T10:35:20.7317944Z 2025-12-04T10:35:20.7318125Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7318810Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7318820Z 2025-12-04T10:35:20.7319042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7319221Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7319304Z frames [('total', 1)] 2025-12-04T10:35:20.7319395Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7319840Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7320026Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7320106Z graph_break [] 2025-12-04T10:35:20.7323951Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7324078Z Traceback (most recent call last): 2025-12-04T10:35:20.7324441Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7324567Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7324988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7325272Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7325709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7325883Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7326315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7326438Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7326898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7327224Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7327675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7327796Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7328206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7328313Z return self._compile_to_module() 2025-12-04T10:35:20.7328726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7328865Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7329308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7329419Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7329895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7330099Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7330595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7330708Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7331146Z File "/tmp/tmp8eglbmbs/bf/cbfe35rfnvfutp2nzixkzreaq26dk3k4gskjg556ft4wznm5elmy.py", line 50, in 2025-12-04T10:35:20.7331552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7331646Z kernel.precompile( 2025-12-04T10:35:20.7332123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7332232Z self._precompile_worker() 2025-12-04T10:35:20.7332739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7332891Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7333407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7333576Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7334008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7334217Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7334589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7334875Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7335075Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7335337Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7335482Z ^ 2025-12-04T10:35:20.7335922Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7335928Z 2025-12-04T10:35:20.7336552Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7336557Z 2025-12-04T10:35:20.7336561Z 2025-12-04T10:35:20.7336744Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7337432Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7337479Z 2025-12-04T10:35:20.7337706Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7337888Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7337983Z frames [('total', 1)] 2025-12-04T10:35:20.7338078Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7338485Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7338676Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7338761Z graph_break [] 2025-12-04T10:35:20.7338950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7339092Z frames [('total', 1)] 2025-12-04T10:35:20.7339188Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7339378Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7339819Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7339909Z graph_break [] 2025-12-04T10:35:20.7340034Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7340305Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7340420Z Traceback (most recent call last): 2025-12-04T10:35:20.7340763Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7340889Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7341304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7341514Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7341953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7342121Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7342552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7342678Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7343129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7343451Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7343898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7344020Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7344433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7344532Z return self._compile_to_module() 2025-12-04T10:35:20.7344948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7345133Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7345573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7345702Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7346162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7346354Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7346862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7346965Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7347452Z File "/tmp/tmp1djeka8t/tq/ctqdg7vdjzxvazwb4l25rkhb26l3llguhnyebxci7dobe7fnxexh.py", line 50, in 2025-12-04T10:35:20.7347843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7347936Z kernel.precompile( 2025-12-04T10:35:20.7348409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7348511Z self._precompile_worker() 2025-12-04T10:35:20.7349021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7349177Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7349688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7349861Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7350287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7350497Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7350876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7351162Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7351362Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7351623Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7351696Z ^ 2025-12-04T10:35:20.7352096Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7352101Z 2025-12-04T10:35:20.7352708Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7352716Z 2025-12-04T10:35:20.7352722Z 2025-12-04T10:35:20.7352905Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7353589Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7353593Z 2025-12-04T10:35:20.7353862Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7354050Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7354136Z frames [('total', 1)] 2025-12-04T10:35:20.7354237Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7354636Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7354835Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7354921Z graph_break [] 2025-12-04T10:35:20.7355100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7355227Z frames [('total', 1)] 2025-12-04T10:35:20.7355333Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7355518Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7355967Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7356057Z graph_break [] 2025-12-04T10:35:20.7356235Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7356330Z frames [('total', 1)] 2025-12-04T10:35:20.7356424Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7356610Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7357009Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7357136Z graph_break [] 2025-12-04T10:35:20.7357696Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.xml - 2025-12-04T10:35:20.7357852Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7358530Z FAILED [0.5923s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7358803Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7358880Z ^ 2025-12-04T10:35:20.7359277Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7359288Z 2025-12-04T10:35:20.7359936Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7359944Z 2025-12-04T10:35:20.7359950Z 2025-12-04T10:35:20.7360136Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7360824Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7360829Z 2025-12-04T10:35:20.7361055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7361210Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7361382Z ================== 1 failed, 187 deselected, 2 rerun in 3.45s ================== 2025-12-04T10:35:20.7361460Z Got exit code 1 2025-12-04T10:35:20.7361942Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7362296Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.7362708Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.xml 2025-12-04T10:35:20.7362842Z ============================= test session starts ============================== 2025-12-04T10:35:20.7363140Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7363289Z cachedir: .pytest_cache 2025-12-04T10:35:20.7363737Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7363841Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7363939Z configfile: pytest.ini 2025-12-04T10:35:20.7364403Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7364605Z collecting ... collected 188 items / 48 deselected / 140 selected 2025-12-04T10:35:20.7364725Z stepcurrent: skipping 48 already run items. 2025-12-04T10:35:20.7364863Z Running 140 items in this shard 2025-12-04T10:35:20.7364868Z 2025-12-04T10:35:20.7365873Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7366514Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7366991Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7367524Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7367950Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7368323Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7368826Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7369261Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7369647Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7370174Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7370563Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7371046Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7371483Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7371930Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7372405Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7372709Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7374145Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7374672Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7375569Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7376113Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7376874Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7377499Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7378252Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7378917Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7379532Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7380178Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7380500Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7381280Z E1204 10:28:48.308000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7381404Z ('RERUN', {'yellow': True}) [2.1332s] [ 0%] 2025-12-04T10:35:20.7382440Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7383085Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7383565Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7384047Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7384480Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7384850Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7385381Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7385870Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7386258Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7386793Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7387176Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7387673Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7388111Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7388572Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7389095Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7389401Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7390843Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7391350Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7392259Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7392809Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7393573Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7394175Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7395526Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7396203Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7396730Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7397382Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7397695Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7398476Z E1204 10:28:48.952000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7398603Z ('RERUN', {'yellow': True}) [0.6094s] [ 0%] 2025-12-04T10:35:20.7399727Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7400371Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7400839Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7401331Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7401795Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7402162Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7402691Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7403123Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7403506Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7404032Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7404403Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7404893Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7405330Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7405794Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7406262Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7406575Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7408276Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7408747Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7409642Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7410182Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7410942Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7411582Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7412346Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7413006Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7413529Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7414231Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7414547Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7415322Z E1204 10:28:49.566000 89250 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7415410Z FAILED [0.6123s] [ 0%] 2025-12-04T10:35:20.7415415Z 2025-12-04T10:35:20.7415537Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7415881Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7415985Z Traceback (most recent call last): 2025-12-04T10:35:20.7416336Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7416460Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7416871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7417093Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7417528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7417695Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7418127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7418249Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7418772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7419092Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7419552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7419672Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7420079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7420186Z return self._compile_to_module() 2025-12-04T10:35:20.7420594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7420727Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7421172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7421279Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7421707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7421898Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7422441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7422555Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7422966Z File "/tmp/tmpe_auaqvz/zz/czz57o3q7co2okbx6hidugeqxaewtskj35xsxxfv4jed6ihd3mas.py", line 50, in 2025-12-04T10:35:20.7423364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7423456Z kernel.precompile( 2025-12-04T10:35:20.7423930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7424068Z self._precompile_worker() 2025-12-04T10:35:20.7424575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7424723Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7425234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7425402Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7425781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7425985Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7426415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7426707Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7426902Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7427161Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7427233Z ^ 2025-12-04T10:35:20.7427623Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7427627Z 2025-12-04T10:35:20.7428243Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7428248Z 2025-12-04T10:35:20.7428252Z 2025-12-04T10:35:20.7428436Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7429188Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7429196Z 2025-12-04T10:35:20.7429423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7429604Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7429697Z frames [('total', 1)] 2025-12-04T10:35:20.7429791Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7430198Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7430389Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7430468Z graph_break [] 2025-12-04T10:35:20.7430760Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7430867Z Traceback (most recent call last): 2025-12-04T10:35:20.7431214Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7431345Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7431769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7431988Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7432469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7432631Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7433078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7433195Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7433654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7433930Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7434411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7434544Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7434954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7435050Z return self._compile_to_module() 2025-12-04T10:35:20.7435468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7435611Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7436059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7436218Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7436640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7436843Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7437339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7437442Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7439308Z File "/tmp/tmpgym3s0re/zq/czq2afu4t524vvkyiy5lt74i32ciwkh5tj7hotnwbhmkftpyciwg.py", line 50, in 2025-12-04T10:35:20.7439704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7439803Z kernel.precompile( 2025-12-04T10:35:20.7440273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7440418Z self._precompile_worker() 2025-12-04T10:35:20.7440930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7441080Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7441598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7441763Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7442141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7442356Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7442726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7443018Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7443207Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7443470Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7443549Z ^ 2025-12-04T10:35:20.7443935Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7443940Z 2025-12-04T10:35:20.7444591Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7444601Z 2025-12-04T10:35:20.7444605Z 2025-12-04T10:35:20.7444789Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7445491Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7445498Z 2025-12-04T10:35:20.7445738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7445997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7446088Z frames [('total', 1)] 2025-12-04T10:35:20.7446182Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7446584Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7446778Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7446857Z graph_break [] 2025-12-04T10:35:20.7447033Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7447120Z frames [('total', 1)] 2025-12-04T10:35:20.7447213Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7447401Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7447845Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7447928Z graph_break [] 2025-12-04T10:35:20.7448050Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7448334Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7448438Z Traceback (most recent call last): 2025-12-04T10:35:20.7448792Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7448913Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7449335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7449546Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7450031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7450200Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7450635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7450752Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7451217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7451490Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7451935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7452055Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7452464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7452567Z return self._compile_to_module() 2025-12-04T10:35:20.7452974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7453112Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7453550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7453701Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7454121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7454313Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7454815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7454932Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7455362Z File "/tmp/tmpcf9tmsnl/xk/cxk43d4xa2sy7vpd5g6fnl3uknwvgx5l67o5ohjyql2wtgaj3dcp.py", line 50, in 2025-12-04T10:35:20.7455800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7455888Z kernel.precompile( 2025-12-04T10:35:20.7456357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7456456Z self._precompile_worker() 2025-12-04T10:35:20.7456963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7457114Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7457618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7457853Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7458237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7458442Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7458818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7459159Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7459348Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7459614Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7459681Z ^ 2025-12-04T10:35:20.7460065Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7460072Z 2025-12-04T10:35:20.7460729Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7460736Z 2025-12-04T10:35:20.7460740Z 2025-12-04T10:35:20.7460919Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7461619Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7461625Z 2025-12-04T10:35:20.7461849Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7462026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7462112Z frames [('total', 1)] 2025-12-04T10:35:20.7462203Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7462602Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7462789Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7462869Z graph_break [] 2025-12-04T10:35:20.7463049Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7463130Z frames [('total', 1)] 2025-12-04T10:35:20.7463221Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7463404Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7463842Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7463927Z graph_break [] 2025-12-04T10:35:20.7464102Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7464183Z frames [('total', 1)] 2025-12-04T10:35:20.7464280Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7464462Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7464854Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7464981Z graph_break [] 2025-12-04T10:35:20.7465535Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.xml - 2025-12-04T10:35:20.7465695Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7466414Z FAILED [0.6123s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7466670Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7466742Z ^ 2025-12-04T10:35:20.7467128Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7467174Z 2025-12-04T10:35:20.7467791Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7467799Z 2025-12-04T10:35:20.7467803Z 2025-12-04T10:35:20.7467979Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7468675Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7468679Z 2025-12-04T10:35:20.7468901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7469047Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7469213Z ================== 1 failed, 48 deselected, 2 rerun in 3.39s =================== 2025-12-04T10:35:20.7469293Z Got exit code 1 2025-12-04T10:35:20.7469386Z Retrying single test... 2025-12-04T10:35:20.7469830Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.xml 2025-12-04T10:35:20.7469963Z ============================= test session starts ============================== 2025-12-04T10:35:20.7470256Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7470342Z cachedir: .pytest_cache 2025-12-04T10:35:20.7470786Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7470888Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7470973Z configfile: pytest.ini 2025-12-04T10:35:20.7471430Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7471616Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.7472238Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7472335Z Running 1 items in this shard 2025-12-04T10:35:20.7472339Z 2025-12-04T10:35:20.7473335Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7474019Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7474487Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7475043Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7475532Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7476001Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7476522Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7476951Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7477329Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7477814Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7478227Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7478713Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7479142Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7479586Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7480053Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7480354Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7481828Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7482295Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7483191Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7483728Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7484488Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7485072Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7485869Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7486530Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7487052Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7487690Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7488039Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7488914Z E1204 10:28:59.130000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7489021Z ('RERUN', {'yellow': True}) [2.1104s] [100%] 2025-12-04T10:35:20.7490013Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7490692Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7491155Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7491633Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7492053Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7492415Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7492961Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7493399Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7493779Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7494262Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7494630Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7495111Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7495543Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7495994Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7496456Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7496764Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7498234Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7498697Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7499635Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7500239Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7501003Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7501584Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7502382Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7503039Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7503563Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7504196Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7504505Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7505312Z E1204 10:28:59.769000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7505421Z ('RERUN', {'yellow': True}) [0.6058s] [100%] 2025-12-04T10:35:20.7506415Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7507053Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7507515Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7508150Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7508572Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7508937Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7509523Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7509957Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7510331Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7510812Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7511186Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7511741Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7512173Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7512619Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7513077Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7513378Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7514861Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7515323Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7516264Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7516860Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7517617Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7518198Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7518952Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7519606Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7520130Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7520770Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7521079Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7521881Z E1204 10:29:00.380000 89432 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7521963Z FAILED [0.6096s] [100%] 2025-12-04T10:35:20.7521968Z 2025-12-04T10:35:20.7522089Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7522370Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7522476Z Traceback (most recent call last): 2025-12-04T10:35:20.7522815Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7522976Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7523390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7523597Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7524039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7524197Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7524626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7524786Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7525238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7525509Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7526008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7526124Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7526531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7526630Z return self._compile_to_module() 2025-12-04T10:35:20.7527036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7527172Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7527667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7527776Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7528196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7528390Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7528894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7528996Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7529416Z File "/tmp/tmp7eie7i9q/g5/cg5dya7d65k3y2oopzyqnsq3d47mcd65o7hqrwxih42wq3v3lpzo.py", line 50, in 2025-12-04T10:35:20.7529808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7529900Z kernel.precompile( 2025-12-04T10:35:20.7530376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7530471Z self._precompile_worker() 2025-12-04T10:35:20.7530979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7531131Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7531679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7531847Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7532223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7532424Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7532808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7533088Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7533318Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7533577Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7533643Z ^ 2025-12-04T10:35:20.7534038Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7534043Z 2025-12-04T10:35:20.7534648Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7534654Z 2025-12-04T10:35:20.7534657Z 2025-12-04T10:35:20.7534836Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7535580Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7535587Z 2025-12-04T10:35:20.7535833Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7536042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7536130Z frames [('total', 1)] 2025-12-04T10:35:20.7536226Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7536636Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7536823Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7536905Z graph_break [] 2025-12-04T10:35:20.7537185Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7537281Z Traceback (most recent call last): 2025-12-04T10:35:20.7537668Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7537787Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7538203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7538413Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7538846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7539007Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7539503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7539619Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7540073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7540345Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7540793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7540911Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7541314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7541494Z return self._compile_to_module() 2025-12-04T10:35:20.7541906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7542039Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7542476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7542582Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7543004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7543239Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7543736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7543841Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7544276Z File "/tmp/tmpqnps4uev/ku/ckueenl5uqa3jupn6s6lf27hx5uc54auma3vrxmztpdq4pdwmwxg.py", line 50, in 2025-12-04T10:35:20.7544673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7544763Z kernel.precompile( 2025-12-04T10:35:20.7545232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7545372Z self._precompile_worker() 2025-12-04T10:35:20.7545876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7546023Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7546532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7546696Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7547077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7547282Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7547651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7547976Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7548173Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7548436Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7548503Z ^ 2025-12-04T10:35:20.7548888Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7548893Z 2025-12-04T10:35:20.7549503Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7549508Z 2025-12-04T10:35:20.7549512Z 2025-12-04T10:35:20.7549690Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7550386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7550396Z 2025-12-04T10:35:20.7550618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7550796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7550883Z frames [('total', 1)] 2025-12-04T10:35:20.7550973Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7551371Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7551615Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7551699Z graph_break [] 2025-12-04T10:35:20.7551877Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7551956Z frames [('total', 1)] 2025-12-04T10:35:20.7552045Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7552234Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7552634Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7552757Z graph_break [] 2025-12-04T10:35:20.7552876Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7553157Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7553263Z Traceback (most recent call last): 2025-12-04T10:35:20.7553604Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7553723Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7554135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7554347Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7554799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7555000Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7555436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7555559Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7556010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7556288Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7556731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7556848Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7557265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7557404Z return self._compile_to_module() 2025-12-04T10:35:20.7557817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7557953Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7558388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7558494Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7558915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7559109Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7559609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7559714Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7560155Z File "/tmp/tmpkc0cbrvl/e5/ce5bv2ptgvsdftkb3dl6zbv5oultpep6ldfiedv3o5xcswoxaan4.py", line 50, in 2025-12-04T10:35:20.7560549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7560635Z kernel.precompile( 2025-12-04T10:35:20.7561108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7561240Z self._precompile_worker() 2025-12-04T10:35:20.7561750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7561899Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7562402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7562574Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7562949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7563191Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7563566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7563856Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7564048Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7564303Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7564372Z ^ 2025-12-04T10:35:20.7564762Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7564766Z 2025-12-04T10:35:20.7565417Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7565425Z 2025-12-04T10:35:20.7565429Z 2025-12-04T10:35:20.7565610Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7566301Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7566308Z 2025-12-04T10:35:20.7566528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7566705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7566783Z frames [('total', 1)] 2025-12-04T10:35:20.7566877Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7567275Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7567590Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7567675Z graph_break [] 2025-12-04T10:35:20.7567851Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7567936Z frames [('total', 1)] 2025-12-04T10:35:20.7568029Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7568210Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7568609Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7568689Z graph_break [] 2025-12-04T10:35:20.7568861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7568947Z frames [('total', 1)] 2025-12-04T10:35:20.7569038Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7569217Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7569615Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7569695Z graph_break [] 2025-12-04T10:35:20.7570248Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.xml - 2025-12-04T10:35:20.7570388Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7571114Z FAILED [0.6096s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7571402Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7571474Z ^ 2025-12-04T10:35:20.7571884Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7571895Z 2025-12-04T10:35:20.7572545Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7572592Z 2025-12-04T10:35:20.7572596Z 2025-12-04T10:35:20.7572772Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7573472Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7573477Z 2025-12-04T10:35:20.7573696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7573843Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7574008Z ================== 1 failed, 187 deselected, 2 rerun in 3.36s ================== 2025-12-04T10:35:20.7574083Z Got exit code 1 2025-12-04T10:35:20.7574218Z Retrying single test... 2025-12-04T10:35:20.7574616Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.xml 2025-12-04T10:35:20.7574749Z ============================= test session starts ============================== 2025-12-04T10:35:20.7575050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7575137Z cachedir: .pytest_cache 2025-12-04T10:35:20.7575592Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7575690Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7575775Z configfile: pytest.ini 2025-12-04T10:35:20.7576236Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7576421Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.7577117Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7577211Z Running 1 items in this shard 2025-12-04T10:35:20.7577215Z 2025-12-04T10:35:20.7578210Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7578854Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7583488Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7584098Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7584525Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7584897Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7585418Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7585930Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7586322Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7586808Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7587193Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7587717Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7588150Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7588608Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7589077Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7589398Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7590875Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7591344Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7592235Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7592778Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7593592Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7594174Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7594932Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7595589Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7596114Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7596752Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7597063Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7597873Z E1204 10:29:09.891000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7597982Z ('RERUN', {'yellow': True}) [2.1298s] [100%] 2025-12-04T10:35:20.7598991Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7599631Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7600151Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7600630Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7601053Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7601434Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7601938Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7602421Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7602807Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7603291Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7603671Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7604155Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7604597Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7605084Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7605553Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7605908Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7607349Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7608008Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7608900Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7609439Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7610278Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7610863Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7611620Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7612332Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7612863Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7613501Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7613814Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7614629Z E1204 10:29:10.531000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7614741Z ('RERUN', {'yellow': True}) [0.6075s] [100%] 2025-12-04T10:35:20.7615745Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7616435Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7616902Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7617435Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7617859Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7618229Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7618734Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7619243Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7619621Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7620109Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7620484Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7620971Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7621406Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7621896Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7622375Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7622678Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7624117Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7624622Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7625510Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7626052Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7626874Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7627462Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7628213Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7628872Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7629433Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7630072Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7630382Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7631150Z E1204 10:29:11.141000 89614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7631242Z FAILED [0.6086s] [100%] 2025-12-04T10:35:20.7631247Z 2025-12-04T10:35:20.7631364Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7631655Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7631767Z Traceback (most recent call last): 2025-12-04T10:35:20.7632112Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7632249Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7632659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7632871Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7633358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7633522Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7633965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7634089Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7634549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7634874Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7635320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7635455Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7635915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7636018Z return self._compile_to_module() 2025-12-04T10:35:20.7636434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7636568Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7637009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7637165Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7637583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7637783Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7638278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7638384Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7638846Z File "/tmp/tmpw3kbw8dv/jd/cjdsncjptmogiwtesbxavcdpsxya2pmdltuxfeayzbyumnabgc3f.py", line 50, in 2025-12-04T10:35:20.7639243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7639340Z kernel.precompile( 2025-12-04T10:35:20.7639854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7639953Z self._precompile_worker() 2025-12-04T10:35:20.7640468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7640619Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7641128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7641298Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7641674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7641891Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7642268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7642554Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7642759Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7643024Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7643093Z ^ 2025-12-04T10:35:20.7643506Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7643555Z 2025-12-04T10:35:20.7644180Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7644185Z 2025-12-04T10:35:20.7644189Z 2025-12-04T10:35:20.7644385Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7645092Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7645143Z 2025-12-04T10:35:20.7645378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7645566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7645652Z frames [('total', 1)] 2025-12-04T10:35:20.7645755Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7646166Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7646357Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7646436Z graph_break [] 2025-12-04T10:35:20.7646730Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7646839Z Traceback (most recent call last): 2025-12-04T10:35:20.7647228Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7647352Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7647783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7647995Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7648440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7648605Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7649038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7649162Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7649621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7649937Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7650387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7650515Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7650939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7651041Z return self._compile_to_module() 2025-12-04T10:35:20.7651457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7651605Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7652043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7652163Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7652584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7652786Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7653299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7653407Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7653895Z File "/tmp/tmpuuh1g8t8/xy/cxyyllic5xoci6rvylaviilarlo2ha3lnt6wcv2cfhtrict5eybz.py", line 50, in 2025-12-04T10:35:20.7654300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7654391Z kernel.precompile( 2025-12-04T10:35:20.7654876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7654979Z self._precompile_worker() 2025-12-04T10:35:20.7655488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7655694Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7656205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7656386Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7656772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7656977Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7657362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7657647Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7657890Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7658160Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7658233Z ^ 2025-12-04T10:35:20.7658628Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7658633Z 2025-12-04T10:35:20.7659290Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7659295Z 2025-12-04T10:35:20.7659299Z 2025-12-04T10:35:20.7659493Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7660197Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7660205Z 2025-12-04T10:35:20.7660478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7660671Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7660758Z frames [('total', 1)] 2025-12-04T10:35:20.7660862Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7661265Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7661453Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7661541Z graph_break [] 2025-12-04T10:35:20.7661717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7661798Z frames [('total', 1)] 2025-12-04T10:35:20.7661900Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7662081Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7662479Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7662561Z graph_break [] 2025-12-04T10:35:20.7662681Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7662982Z _ TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.7663086Z Traceback (most recent call last): 2025-12-04T10:35:20.7663475Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7663604Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7664021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7664235Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7664673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7664846Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7665282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7665473Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7665976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7666255Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7666699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7666824Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7667231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7667375Z return self._compile_to_module() 2025-12-04T10:35:20.7667787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7667924Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7668376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7668487Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7668904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7669109Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7669604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7669710Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7670189Z File "/tmp/tmpflu603bh/bq/cbqxp44iicb6iufb2ymuzdg3f5fvc2ax7r26kofnfo5i2gsiaskj.py", line 50, in 2025-12-04T10:35:20.7670584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7670679Z kernel.precompile( 2025-12-04T10:35:20.7671150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7671243Z self._precompile_worker() 2025-12-04T10:35:20.7671757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7671906Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7672424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7672590Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7672970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7673177Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7673551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7673832Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7674070Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7674329Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7674406Z ^ 2025-12-04T10:35:20.7674795Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7674800Z 2025-12-04T10:35:20.7675407Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7675466Z 2025-12-04T10:35:20.7675469Z 2025-12-04T10:35:20.7675653Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7676345Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7676350Z 2025-12-04T10:35:20.7676584Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7676760Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7676850Z frames [('total', 1)] 2025-12-04T10:35:20.7676947Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7677344Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7677586Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7677666Z graph_break [] 2025-12-04T10:35:20.7677842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7677933Z frames [('total', 1)] 2025-12-04T10:35:20.7678023Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7678205Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7678609Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7678687Z graph_break [] 2025-12-04T10:35:20.7678866Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7678948Z frames [('total', 1)] 2025-12-04T10:35:20.7679039Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7679223Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7679657Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7679735Z graph_break [] 2025-12-04T10:35:20.7680295Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.xml - 2025-12-04T10:35:20.7680437Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7681125Z FAILED [0.6086s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7681390Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7681455Z ^ 2025-12-04T10:35:20.7681845Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7681850Z 2025-12-04T10:35:20.7682459Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7682465Z 2025-12-04T10:35:20.7682469Z 2025-12-04T10:35:20.7682653Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7683340Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7683391Z 2025-12-04T10:35:20.7683620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7683781Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7683946Z ================== 1 failed, 187 deselected, 2 rerun in 3.38s ================== 2025-12-04T10:35:20.7684036Z Got exit code 1 2025-12-04T10:35:20.7684517Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.7684879Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.7685335Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.xml 2025-12-04T10:35:20.7685478Z ============================= test session starts ============================== 2025-12-04T10:35:20.7685793Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7685912Z cachedir: .pytest_cache 2025-12-04T10:35:20.7686389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7686506Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7686600Z configfile: pytest.ini 2025-12-04T10:35:20.7687069Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7687321Z collecting ... collected 188 items / 49 deselected / 139 selected 2025-12-04T10:35:20.7687444Z stepcurrent: skipping 49 already run items. 2025-12-04T10:35:20.7687549Z Running 139 items in this shard 2025-12-04T10:35:20.7687553Z 2025-12-04T10:35:20.7688002Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda PASSED [2.3742s] [ 0%] 2025-12-04T10:35:20.7688448Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6928s] [ 1%] 2025-12-04T10:35:20.7689432Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7690124Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7690599Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7691078Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7691521Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7691899Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7692411Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7692859Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7693238Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7693740Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7694158Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7694638Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7695077Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7695536Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7696063Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7696419Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7697865Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7698338Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7699344Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7699890Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7700653Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7701242Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7702037Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7702725Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7703248Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7703890Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7704209Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7704969Z E1204 10:29:22.101000 89796 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7705090Z ('RERUN', {'yellow': True}) [0.4801s] [ 2%] 2025-12-04T10:35:20.7705600Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [0.8779s] [ 2%] 2025-12-04T10:35:20.7706105Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda FAILED [0.8616s] [ 2%] 2025-12-04T10:35:20.7706110Z 2025-12-04T10:35:20.7706301Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7706571Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7706686Z Traceback (most recent call last): 2025-12-04T10:35:20.7707037Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7707160Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7707590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7707974Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7708610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7708774Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7709216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7709343Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7709807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7710083Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7710609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7710730Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7711150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7711258Z return self._compile_to_module() 2025-12-04T10:35:20.7711673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7711835Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7712271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7712391Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7712810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7713076Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7713585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7713698Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7714129Z File "/tmp/tmp_noq3ytb/gu/cgunxsafvni65swzh7z7pgrdxcoe3jhwdf6yibigvusp32vv3tir.py", line 50, in 2025-12-04T10:35:20.7714522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7714613Z kernel.precompile( 2025-12-04T10:35:20.7715094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7715193Z self._precompile_worker() 2025-12-04T10:35:20.7715726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7715913Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7716416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7716597Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7716975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7717240Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7717620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7717904Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7718113Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7718379Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7718451Z ^ 2025-12-04T10:35:20.7718850Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7718897Z 2025-12-04T10:35:20.7719510Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7719515Z 2025-12-04T10:35:20.7719519Z 2025-12-04T10:35:20.7719707Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7720394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7720399Z 2025-12-04T10:35:20.7720625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7720867Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7720948Z frames [('total', 1)] 2025-12-04T10:35:20.7721053Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7721245Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7721641Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7721726Z graph_break [] 2025-12-04T10:35:20.7721995Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7722091Z Traceback (most recent call last): 2025-12-04T10:35:20.7722435Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7722551Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7722966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7723215Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7723647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7723813Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7724246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7724373Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7724821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7725087Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7725536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7725660Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7726070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7726180Z return self._compile_to_module() 2025-12-04T10:35:20.7726588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7726731Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7727209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7727314Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7727740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7727929Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7728431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7728534Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7729020Z File "/tmp/tmpm4haybre/cg/ccglg3kkkbgqtba77iuoorkipbmgpq6memshhpldgay6cxqq43hp.py", line 84, in 2025-12-04T10:35:20.7729407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.7729498Z self._wait_futures(scope) 2025-12-04T10:35:20.7729917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.7730011Z kernel = result.result() 2025-12-04T10:35:20.7730383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.7730480Z return self.result_fn() 2025-12-04T10:35:20.7730886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.7731033Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.7731364Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.7731369Z 2025-12-04T10:35:20.7731479Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7731583Z Traceback (most recent call last): 2025-12-04T10:35:20.7732043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.7732124Z result = job() 2025-12-04T10:35:20.7732624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.7732744Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.7733213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.7733357Z self._precompile_worker() 2025-12-04T10:35:20.7733861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7734016Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7734520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7734687Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7735065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7735266Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7735643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7735932Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7736085Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7736346Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7736416Z ^ 2025-12-04T10:35:20.7736804Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7736809Z 2025-12-04T10:35:20.7736812Z 2025-12-04T10:35:20.7737466Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7737471Z 2025-12-04T10:35:20.7737475Z 2025-12-04T10:35:20.7737665Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7738349Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7738357Z 2025-12-04T10:35:20.7738579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7738805Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7738886Z frames [('total', 1)] 2025-12-04T10:35:20.7738977Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7739255Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7739655Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7739729Z graph_break [] 2025-12-04T10:35:20.7739906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7739985Z frames [('total', 1)] 2025-12-04T10:35:20.7740076Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7740257Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7740887Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7740974Z graph_break [] 2025-12-04T10:35:20.7741091Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7741355Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7741456Z Traceback (most recent call last): 2025-12-04T10:35:20.7741798Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7741926Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7742335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7742548Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7743058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7743219Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7743651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7743771Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7744223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7744501Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7744935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7745053Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7745467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7745566Z return self._compile_to_module() 2025-12-04T10:35:20.7745975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7746109Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7746544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7746690Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7747108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7747297Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7747793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7747899Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7748335Z File "/tmp/tmpq28ktwbd/ej/cejyztkg2iny6hlwuthq35ulchpdk7nttbfwfkq7hmvrbhmv4nrp.py", line 84, in 2025-12-04T10:35:20.7748760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.7748852Z self._wait_futures(scope) 2025-12-04T10:35:20.7749273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.7749367Z kernel = result.result() 2025-12-04T10:35:20.7749740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.7749829Z return self.result_fn() 2025-12-04T10:35:20.7750230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.7750336Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.7750707Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.7750711Z 2025-12-04T10:35:20.7750839Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7750978Z Traceback (most recent call last): 2025-12-04T10:35:20.7751467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.7751546Z result = job() 2025-12-04T10:35:20.7752047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.7752167Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.7752638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.7752732Z self._precompile_worker() 2025-12-04T10:35:20.7753293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7753441Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7753944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7754108Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7754484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7754684Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7755056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7755335Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7755498Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7755755Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7755823Z ^ 2025-12-04T10:35:20.7756212Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7756217Z 2025-12-04T10:35:20.7756221Z 2025-12-04T10:35:20.7756827Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7756879Z 2025-12-04T10:35:20.7756884Z 2025-12-04T10:35:20.7757068Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7757745Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7757750Z 2025-12-04T10:35:20.7757975Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7758153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7758277Z frames [('total', 1)] 2025-12-04T10:35:20.7758375Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7758565Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7758960Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7759044Z graph_break [] 2025-12-04T10:35:20.7759219Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7759313Z frames [('total', 1)] 2025-12-04T10:35:20.7759401Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7759580Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7760078Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7760201Z graph_break [] 2025-12-04T10:35:20.7760373Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7760459Z frames [('total', 1)] 2025-12-04T10:35:20.7760547Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7760727Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7761225Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.7761302Z graph_break [] 2025-12-04T10:35:20.7761855Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.xml - 2025-12-04T10:35:20.7761993Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7762829Z FAILED [0.8616s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.7762846Z 2025-12-04T10:35:20.7762949Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7763054Z Traceback (most recent call last): 2025-12-04T10:35:20.7763525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.7763598Z result = job() 2025-12-04T10:35:20.7764101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.7764218Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.7764686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.7764776Z self._precompile_worker() 2025-12-04T10:35:20.7765287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7765435Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7765942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7766104Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7766624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7766831Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7767201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7767481Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7767634Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7767889Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7768000Z ^ 2025-12-04T10:35:20.7768383Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7768388Z 2025-12-04T10:35:20.7768392Z 2025-12-04T10:35:20.7768999Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7769004Z 2025-12-04T10:35:20.7769007Z 2025-12-04T10:35:20.7769184Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7769861Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7769909Z 2025-12-04T10:35:20.7770132Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7770279Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7770466Z ============= 1 failed, 2 passed, 49 deselected, 2 rerun in 5.32s ============== 2025-12-04T10:35:20.7770543Z Got exit code 1 2025-12-04T10:35:20.7770627Z Retrying single test... 2025-12-04T10:35:20.7771025Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.xml 2025-12-04T10:35:20.7771160Z ============================= test session starts ============================== 2025-12-04T10:35:20.7771451Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7771536Z cachedir: .pytest_cache 2025-12-04T10:35:20.7771983Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7772089Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7772212Z configfile: pytest.ini 2025-12-04T10:35:20.7772678Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7772865Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.7773467Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7773560Z Running 1 items in this shard 2025-12-04T10:35:20.7773565Z 2025-12-04T10:35:20.7774530Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7775176Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7775639Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7776118Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7776585Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7776951Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7777455Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7777892Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7778269Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7778795Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7779257Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7779742Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7780171Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7780613Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7781156Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7781464Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7782896Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7783349Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7784281Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7784816Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7785581Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7786157Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7786907Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7787569Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7788089Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7788767Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7789075Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7789840Z E1204 10:29:33.485000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7789948Z ('RERUN', {'yellow': True}) [2.2324s] [100%] 2025-12-04T10:35:20.7790912Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7791590Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7792049Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7792526Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7792989Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7793351Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7793864Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7794295Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7794673Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7795151Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7795560Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7796093Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7796524Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7796970Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7797435Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7797742Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7799163Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7799633Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7800560Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7801093Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7801858Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7802476Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7803230Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7803886Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7804406Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7805089Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7805398Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7806164Z E1204 10:29:34.115000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7806271Z ('RERUN', {'yellow': True}) [0.5965s] [100%] 2025-12-04T10:35:20.7807239Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7808272Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7808748Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7809234Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7809652Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7810016Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7810518Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7810954Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7811329Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7811812Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7812250Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7812730Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7813165Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7813611Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7814074Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7814438Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7815866Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7816324Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7817267Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7817805Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7818559Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7819193Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7819992Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7820651Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7821175Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7821810Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7822121Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7822879Z E1204 10:29:34.707000 90063 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7822960Z FAILED [0.5903s] [100%] 2025-12-04T10:35:20.7822975Z 2025-12-04T10:35:20.7823098Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7823363Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7823465Z Traceback (most recent call last): 2025-12-04T10:35:20.7823846Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7823969Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7824383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7824595Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7825033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7825201Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7825647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7825874Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7826334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7826603Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7827043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7827165Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7827570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7827711Z return self._compile_to_module() 2025-12-04T10:35:20.7828117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7828260Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7828698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7828805Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7829223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7829414Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7829915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7830014Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7830463Z File "/tmp/tmp1a_jzp_w/6d/c6dxsz36vbtu6jr4bsr4pjtozpg44wbnirs25mmysage5t5mvrmk.py", line 50, in 2025-12-04T10:35:20.7830857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7830947Z kernel.precompile( 2025-12-04T10:35:20.7831416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7831507Z self._precompile_worker() 2025-12-04T10:35:20.7832018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7832167Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7832668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7832835Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7833216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7833420Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7833793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7834074Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7834308Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7834568Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7834632Z ^ 2025-12-04T10:35:20.7835021Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7835026Z 2025-12-04T10:35:20.7835631Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7835679Z 2025-12-04T10:35:20.7835683Z 2025-12-04T10:35:20.7835869Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7836553Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7836558Z 2025-12-04T10:35:20.7836781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7836966Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7837047Z frames [('total', 1)] 2025-12-04T10:35:20.7837139Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7837543Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7837772Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7837852Z graph_break [] 2025-12-04T10:35:20.7838118Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7838217Z Traceback (most recent call last): 2025-12-04T10:35:20.7838558Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7838678Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7839090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7839299Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7839730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7839893Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7840365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7840485Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7840936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7841205Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7841646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7841762Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7842163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7842260Z return self._compile_to_module() 2025-12-04T10:35:20.7842669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7842807Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7843245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7843347Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7843766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7843999Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7844499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7844608Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7845040Z File "/tmp/tmpzdi8cq9q/kr/ckrscaiea657dvrltsary4ylwskyylweod6cpsmto36du463ajg5.py", line 50, in 2025-12-04T10:35:20.7845437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7845524Z kernel.precompile( 2025-12-04T10:35:20.7846084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7846179Z self._precompile_worker() 2025-12-04T10:35:20.7846688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7846835Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7847337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7847499Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7851489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7851786Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7852173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7852463Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7852657Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7852924Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7852996Z ^ 2025-12-04T10:35:20.7853389Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7853394Z 2025-12-04T10:35:20.7854100Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7854109Z 2025-12-04T10:35:20.7854159Z 2025-12-04T10:35:20.7854345Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7855031Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7855037Z 2025-12-04T10:35:20.7855261Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7855444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7855538Z frames [('total', 1)] 2025-12-04T10:35:20.7855632Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7856087Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7856274Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7856359Z graph_break [] 2025-12-04T10:35:20.7856545Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7856630Z frames [('total', 1)] 2025-12-04T10:35:20.7856725Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7856913Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7857304Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7857388Z graph_break [] 2025-12-04T10:35:20.7857560Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7857831Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7857941Z Traceback (most recent call last): 2025-12-04T10:35:20.7858283Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7858408Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7858831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7859167Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7859610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7859771Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7860205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7860330Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7860781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7861058Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7861552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7861672Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7862088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7862191Z return self._compile_to_module() 2025-12-04T10:35:20.7862602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7862741Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7863177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7863288Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7863704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7863941Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7864443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7864553Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7864987Z File "/tmp/tmpslxjh15f/6e/c6e2ati7d7zyxmnwtijlozondx4tq7ha42evr6vmnxhzami7xmfj.py", line 50, in 2025-12-04T10:35:20.7865379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7865471Z kernel.precompile( 2025-12-04T10:35:20.7865943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7866035Z self._precompile_worker() 2025-12-04T10:35:20.7866543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7866699Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7867202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7867374Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7867751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7868026Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7868406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7868688Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7868882Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7869146Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7869217Z ^ 2025-12-04T10:35:20.7869611Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7869657Z 2025-12-04T10:35:20.7870265Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7870270Z 2025-12-04T10:35:20.7870274Z 2025-12-04T10:35:20.7870460Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7871138Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7871143Z 2025-12-04T10:35:20.7871365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7871591Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7871676Z frames [('total', 1)] 2025-12-04T10:35:20.7871776Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7872175Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7872361Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7872443Z graph_break [] 2025-12-04T10:35:20.7872620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7872704Z frames [('total', 1)] 2025-12-04T10:35:20.7872804Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7872985Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7873381Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7873463Z graph_break [] 2025-12-04T10:35:20.7873678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7873766Z frames [('total', 1)] 2025-12-04T10:35:20.7873858Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7874039Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7874433Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7874514Z graph_break [] 2025-12-04T10:35:20.7875078Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.xml - 2025-12-04T10:35:20.7875219Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7875925Z FAILED [0.5903s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7876202Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7876271Z ^ 2025-12-04T10:35:20.7876661Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7876674Z 2025-12-04T10:35:20.7877277Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7877281Z 2025-12-04T10:35:20.7877330Z 2025-12-04T10:35:20.7877515Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7878195Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7878200Z 2025-12-04T10:35:20.7878431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7878594Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7878762Z ================== 1 failed, 187 deselected, 2 rerun in 3.45s ================== 2025-12-04T10:35:20.7878887Z Got exit code 1 2025-12-04T10:35:20.7878983Z Retrying single test... 2025-12-04T10:35:20.7879388Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.xml 2025-12-04T10:35:20.7879533Z ============================= test session starts ============================== 2025-12-04T10:35:20.7879833Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7879928Z cachedir: .pytest_cache 2025-12-04T10:35:20.7880387Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7880491Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7880624Z configfile: pytest.ini 2025-12-04T10:35:20.7881098Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7881286Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.7881898Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7881994Z Running 1 items in this shard 2025-12-04T10:35:20.7881999Z 2025-12-04T10:35:20.7882971Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7883619Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7884130Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7884620Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7885041Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7885418Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7885962Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7886407Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7886797Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7887280Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7887661Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7888191Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7888627Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7889075Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7889542Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7889856Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7891325Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7891789Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7892678Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7893255Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7894021Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7894600Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7895355Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7896057Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7896587Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7897222Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7897528Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7898293Z E1204 10:29:44.230000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7898408Z ('RERUN', {'yellow': True}) [2.2306s] [100%] 2025-12-04T10:35:20.7899425Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7900105Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7900575Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7901050Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7901473Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7901849Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7902397Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7902835Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7903215Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7903694Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7904065Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7904589Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7905030Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7905478Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7905941Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7906246Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7907915Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7908449Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7909343Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7909878Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7910636Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7911225Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7911982Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7912722Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7913246Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7913881Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7914195Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7915098Z E1204 10:29:44.852000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7915205Z ('RERUN', {'yellow': True}) [0.5876s] [100%] 2025-12-04T10:35:20.7916188Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7916821Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7917353Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7917831Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7918262Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7918626Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7919131Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7919564Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7920005Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7920501Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7920871Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7921354Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7921796Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7922245Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7922727Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7923030Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7924494Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7924948Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7925883Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7926473Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7927232Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7927825Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7928576Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7929284Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7929803Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.7930445Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7930757Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.7931521Z E1204 10:29:45.440000 90245 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7931652Z FAILED [0.5868s] [100%] 2025-12-04T10:35:20.7931657Z 2025-12-04T10:35:20.7931774Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.7932050Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7932153Z Traceback (most recent call last): 2025-12-04T10:35:20.7932500Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7932627Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7933038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7933248Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7933694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7933857Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7934292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7934416Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7934866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7935138Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7935625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7935751Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7936163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7936263Z return self._compile_to_module() 2025-12-04T10:35:20.7936684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7936859Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7937296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7937405Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7937831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7938035Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7938532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7938635Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7939288Z File "/tmp/tmpyaul7sw1/ok/cok5zffncijn2tkbqphtlfw7zd7ky6dze72bc6ubunifdvia6ewh.py", line 50, in 2025-12-04T10:35:20.7939861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7939988Z kernel.precompile( 2025-12-04T10:35:20.7940459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7940559Z self._precompile_worker() 2025-12-04T10:35:20.7941074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7941221Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7941725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7941891Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7942324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7942538Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7942914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7943193Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7943388Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7943649Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7943714Z ^ 2025-12-04T10:35:20.7944106Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7944111Z 2025-12-04T10:35:20.7944723Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7944731Z 2025-12-04T10:35:20.7944735Z 2025-12-04T10:35:20.7944922Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7945600Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7945605Z 2025-12-04T10:35:20.7945830Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7946052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7946134Z frames [('total', 1)] 2025-12-04T10:35:20.7946237Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7946635Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7946825Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7946909Z graph_break [] 2025-12-04T10:35:20.7947174Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7947322Z Traceback (most recent call last): 2025-12-04T10:35:20.7947662Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7947786Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7948206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7948411Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7948849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7949012Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7949444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7949634Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7950084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7950355Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7950803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7950921Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7951333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7951429Z return self._compile_to_module() 2025-12-04T10:35:20.7951837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7952024Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7952464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7952577Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7952995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7953191Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7953702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7953804Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7954242Z File "/tmp/tmpr5a8359g/jb/cjbbkbc2mkqgvp4etj4dnbq4cfhvu5ehkoibczcl7cuzci4uxqnp.py", line 50, in 2025-12-04T10:35:20.7954651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7954744Z kernel.precompile( 2025-12-04T10:35:20.7955222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7955321Z self._precompile_worker() 2025-12-04T10:35:20.7955881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7956075Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7956581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7956754Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7957133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7957344Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7957721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7958049Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7958238Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7958511Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7958588Z ^ 2025-12-04T10:35:20.7958978Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7958982Z 2025-12-04T10:35:20.7959586Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7959591Z 2025-12-04T10:35:20.7959634Z 2025-12-04T10:35:20.7959826Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7960506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7960513Z 2025-12-04T10:35:20.7960732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7960918Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7961009Z frames [('total', 1)] 2025-12-04T10:35:20.7961112Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7961511Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7961704Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7961790Z graph_break [] 2025-12-04T10:35:20.7961965Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7962094Z frames [('total', 1)] 2025-12-04T10:35:20.7962197Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7962380Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7962770Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7962857Z graph_break [] 2025-12-04T10:35:20.7962979Z =================================== FAILURES =================================== 2025-12-04T10:35:20.7963251Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.7963351Z Traceback (most recent call last): 2025-12-04T10:35:20.7963692Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.7963817Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.7964229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.7964446Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.7964888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.7965050Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.7965526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.7965647Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.7966148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.7966423Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.7966867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.7966995Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.7967402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.7967542Z return self._compile_to_module() 2025-12-04T10:35:20.7967953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.7968090Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.7968534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.7968638Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.7969058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.7969254Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.7969794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.7969901Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.7970339Z File "/tmp/tmpsn7x6ul8/67/c674fw4vlk5qvqpgz5svcbokhhuth32vhiv3hcgelfstal7waxdx.py", line 50, in 2025-12-04T10:35:20.7970731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.7970830Z kernel.precompile( 2025-12-04T10:35:20.7971301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.7971393Z self._precompile_worker() 2025-12-04T10:35:20.7971908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.7972098Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.7972609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7972774Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7973150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.7973363Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.7973734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.7974013Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.7974207Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7974459Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7974539Z ^ 2025-12-04T10:35:20.7974926Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7974932Z 2025-12-04T10:35:20.7975539Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7975544Z 2025-12-04T10:35:20.7975552Z 2025-12-04T10:35:20.7975776Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7976461Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7976467Z 2025-12-04T10:35:20.7976695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7976869Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7976960Z frames [('total', 1)] 2025-12-04T10:35:20.7977057Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7977456Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7977692Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7977776Z graph_break [] 2025-12-04T10:35:20.7977951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7978040Z frames [('total', 1)] 2025-12-04T10:35:20.7978130Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7978312Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7978717Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7978795Z graph_break [] 2025-12-04T10:35:20.7978980Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.7979216Z frames [('total', 1)] 2025-12-04T10:35:20.7979306Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.7979491Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.7979879Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.7979955Z graph_break [] 2025-12-04T10:35:20.7980516Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.xml - 2025-12-04T10:35:20.7980654Z =========================== short test summary info ============================ 2025-12-04T10:35:20.7981314Z FAILED [0.5868s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.7981575Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7981691Z ^ 2025-12-04T10:35:20.7982080Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.7982087Z 2025-12-04T10:35:20.7982687Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.7982692Z 2025-12-04T10:35:20.7982695Z 2025-12-04T10:35:20.7982880Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.7983551Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7983556Z 2025-12-04T10:35:20.7983781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.7983933Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.7984101Z ================== 1 failed, 187 deselected, 2 rerun in 3.44s ================== 2025-12-04T10:35:20.7984186Z Got exit code 1 2025-12-04T10:35:20.7984657Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.7985004Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.7985449Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.xml 2025-12-04T10:35:20.7985585Z ============================= test session starts ============================== 2025-12-04T10:35:20.7985879Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.7985965Z cachedir: .pytest_cache 2025-12-04T10:35:20.7986411Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.7986516Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.7986602Z configfile: pytest.ini 2025-12-04T10:35:20.7987127Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.7987318Z collecting ... collected 188 items / 52 deselected / 136 selected 2025-12-04T10:35:20.7987432Z stepcurrent: skipping 52 already run items. 2025-12-04T10:35:20.7987532Z Running 136 items in this shard 2025-12-04T10:35:20.7987538Z 2025-12-04T10:35:20.7988529Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.7989171Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.7989677Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.7990155Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.7990580Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.7990943Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.7991448Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.7991918Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.7992301Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.7992782Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.7993152Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.7993636Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.7994065Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.7994512Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.7994983Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.7995286Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.7996763Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.7997222Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.7998119Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.7998692Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.7999450Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8000024Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8000771Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8001470Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8001992Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8002630Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8002935Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8003694Z E1204 10:29:54.946000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8003845Z ('RERUN', {'yellow': True}) [2.1362s] [ 0%] 2025-12-04T10:35:20.8004827Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8005467Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8005932Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8006408Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8006830Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8007192Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8007696Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8008452Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.8008836Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.8009317Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.8009684Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.8010171Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.8010657Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.8011103Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.8011568Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.8011868Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8013298Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8013809Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8014700Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8015229Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8016086Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8016673Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8017426Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8018080Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8018596Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8019317Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8019626Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8020387Z E1204 10:29:55.591000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8020542Z ('RERUN', {'yellow': True}) [0.6114s] [ 0%] 2025-12-04T10:35:20.8021527Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8022161Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8022627Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8023150Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8023568Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8023933Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8024439Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8024878Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.8025300Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.8025781Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.8026153Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.8026635Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.8027069Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.8027513Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.8028020Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.8028328Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8029749Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8030204Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8031092Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8031633Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8032458Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8033033Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8033783Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8034437Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8034995Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8035630Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8035989Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8036748Z E1204 10:29:56.203000 90427 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8036875Z FAILED [0.6103s] [ 0%] 2025-12-04T10:35:20.8036880Z 2025-12-04T10:35:20.8037003Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8037279Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8037385Z Traceback (most recent call last): 2025-12-04T10:35:20.8037724Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8037844Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8038260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8038467Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8038900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8039105Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8039537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8039657Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8040107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8040378Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8040821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8040941Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8041346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8041443Z return self._compile_to_module() 2025-12-04T10:35:20.8041852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8041991Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8042424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8042527Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8042989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8043183Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8043686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8043786Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8044226Z File "/tmp/tmpxi43olr7/yn/cynvtary3itzbawlh6affqhesb2xtcqbaisowtegssd6eai4qren.py", line 50, in 2025-12-04T10:35:20.8044619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8044747Z kernel.precompile( 2025-12-04T10:35:20.8045218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8045310Z self._precompile_worker() 2025-12-04T10:35:20.8045815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8045965Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8046515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8046683Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8047103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8047307Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8047680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8047959Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8048149Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8048408Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8048478Z ^ 2025-12-04T10:35:20.8048866Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8048871Z 2025-12-04T10:35:20.8049520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8049528Z 2025-12-04T10:35:20.8049534Z 2025-12-04T10:35:20.8049715Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8050409Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8050414Z 2025-12-04T10:35:20.8050636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8050821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8050901Z frames [('total', 1)] 2025-12-04T10:35:20.8050991Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8051390Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8051578Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8051660Z graph_break [] 2025-12-04T10:35:20.8051933Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8052033Z Traceback (most recent call last): 2025-12-04T10:35:20.8052372Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8052490Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8052942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8053168Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8053628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8053798Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8054269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8054438Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8054922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8055212Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8055685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8055813Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8056246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8056351Z return self._compile_to_module() 2025-12-04T10:35:20.8056794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8056972Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8057414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8057520Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8057939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8058134Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8058628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8058738Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8059224Z File "/tmp/tmp550sq1hj/o5/co5t4hufn7rxg6kbmom234p6bkqznscbt2mhsj7pdu4i2zqsfowc.py", line 50, in 2025-12-04T10:35:20.8059667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8059754Z kernel.precompile( 2025-12-04T10:35:20.8060224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8060328Z self._precompile_worker() 2025-12-04T10:35:20.8060831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8060979Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8061485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8061648Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8062026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8062233Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8062604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8062888Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8063076Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8063379Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8063448Z ^ 2025-12-04T10:35:20.8063831Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8063836Z 2025-12-04T10:35:20.8064446Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8064453Z 2025-12-04T10:35:20.8064459Z 2025-12-04T10:35:20.8064637Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8065372Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8065377Z 2025-12-04T10:35:20.8065596Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8065775Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8065860Z frames [('total', 1)] 2025-12-04T10:35:20.8065950Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8066349Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8066532Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8066651Z graph_break [] 2025-12-04T10:35:20.8066832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8066910Z frames [('total', 1)] 2025-12-04T10:35:20.8067002Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8067183Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8067586Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8067696Z graph_break [] 2025-12-04T10:35:20.8067863Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8068230Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8068376Z Traceback (most recent call last): 2025-12-04T10:35:20.8068813Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8068970Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8069480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8069695Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8070133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8070294Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8070726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8070847Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8071301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8071575Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8072018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8072137Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8072544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8072639Z return self._compile_to_module() 2025-12-04T10:35:20.8073045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8073227Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8073663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8073773Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8074189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8074384Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8074883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8075031Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8075466Z File "/tmp/tmp15sxofn6/zk/czkc2el7owodv7cp32zudnyv5dcasm6proasoefqpat425aoh4kw.py", line 50, in 2025-12-04T10:35:20.8075861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8075947Z kernel.precompile( 2025-12-04T10:35:20.8076418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8076508Z self._precompile_worker() 2025-12-04T10:35:20.8077010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8077207Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8077709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8077878Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8078253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8078456Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8078834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8079112Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8079307Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8079612Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8079686Z ^ 2025-12-04T10:35:20.8080081Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8080089Z 2025-12-04T10:35:20.8080691Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8080696Z 2025-12-04T10:35:20.8080700Z 2025-12-04T10:35:20.8080882Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8081566Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8081571Z 2025-12-04T10:35:20.8081796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8081979Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8082060Z frames [('total', 1)] 2025-12-04T10:35:20.8082156Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8082551Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8082741Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8082821Z graph_break [] 2025-12-04T10:35:20.8083124Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8083204Z frames [('total', 1)] 2025-12-04T10:35:20.8083299Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8083477Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8083977Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8084061Z graph_break [] 2025-12-04T10:35:20.8084240Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8084326Z frames [('total', 1)] 2025-12-04T10:35:20.8084466Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8084648Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8085046Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8085122Z graph_break [] 2025-12-04T10:35:20.8085686Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.xml - 2025-12-04T10:35:20.8085828Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8086503Z FAILED [0.6103s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8086810Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8086880Z ^ 2025-12-04T10:35:20.8087276Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8087285Z 2025-12-04T10:35:20.8087885Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8087889Z 2025-12-04T10:35:20.8087895Z 2025-12-04T10:35:20.8088072Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8088758Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8088763Z 2025-12-04T10:35:20.8088987Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8089188Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8089354Z ================== 1 failed, 52 deselected, 2 rerun in 3.39s =================== 2025-12-04T10:35:20.8089438Z Got exit code 1 2025-12-04T10:35:20.8089528Z Retrying single test... 2025-12-04T10:35:20.8089923Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.xml 2025-12-04T10:35:20.8090056Z ============================= test session starts ============================== 2025-12-04T10:35:20.8090350Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8090439Z cachedir: .pytest_cache 2025-12-04T10:35:20.8090887Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8090989Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8091078Z configfile: pytest.ini 2025-12-04T10:35:20.8091538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8091724Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.8092346Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8092441Z Running 1 items in this shard 2025-12-04T10:35:20.8092446Z 2025-12-04T10:35:20.8093475Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8094118Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8094581Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8095103Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8095523Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8095893Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8096402Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8096832Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.8097255Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.8097738Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.8098111Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.8098598Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.8099085Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.8099554Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.8100071Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.8100387Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8101822Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8102282Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8103171Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8103706Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8104517Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8105098Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8105907Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8106566Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8107136Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8107986Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8108318Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8109076Z E1204 10:30:05.723000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8109297Z ('RERUN', {'yellow': True}) [2.1380s] [100%] 2025-12-04T10:35:20.8110285Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8110927Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8111394Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8111869Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8112344Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8112727Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8113231Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8113667Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.8114043Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.8114520Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.8114893Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.8115373Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.8115851Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.8116297Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.8116815Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.8120962Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8122424Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8122988Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8123882Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8124427Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8125196Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8125850Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8126614Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8127277Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8127815Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8128496Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8128819Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8129580Z E1204 10:30:06.368000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8129696Z ('RERUN', {'yellow': True}) [0.6123s] [100%] 2025-12-04T10:35:20.8130693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8131337Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8131820Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8132301Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8132776Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8133145Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8133649Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8134087Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.8134464Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.8134999Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.8135369Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.8135856Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.8136289Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.8136736Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.8137245Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.8137551Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8138984Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8139515Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8140451Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8140991Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8141748Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8142333Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8143087Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8143754Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8144278Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8144962Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8145277Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8146038Z E1204 10:30:06.982000 90609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8146130Z FAILED [0.6128s] [100%] 2025-12-04T10:35:20.8146136Z 2025-12-04T10:35:20.8146253Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8146576Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8146680Z Traceback (most recent call last): 2025-12-04T10:35:20.8147021Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8147160Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8147576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8147785Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8148230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8148436Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8148881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8149008Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8149462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8149735Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8150175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8150302Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8150711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8150813Z return self._compile_to_module() 2025-12-04T10:35:20.8151267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8151403Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8151843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8151953Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8152373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8152572Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8153066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8153169Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8153598Z File "/tmp/tmpqpysmj_v/rl/crlee4xho3sfuulocci4ks62a5c4qqccrykw6r5bpu3nv64errlx.py", line 50, in 2025-12-04T10:35:20.8153987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8154077Z kernel.precompile( 2025-12-04T10:35:20.8154557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8154654Z self._precompile_worker() 2025-12-04T10:35:20.8155208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8155356Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8155858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8156026Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8156407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8156620Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8157059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8157342Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8157540Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8157800Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8157866Z ^ 2025-12-04T10:35:20.8158267Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8158272Z 2025-12-04T10:35:20.8158883Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8158928Z 2025-12-04T10:35:20.8158932Z 2025-12-04T10:35:20.8159121Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8159813Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8159819Z 2025-12-04T10:35:20.8160048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8160230Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8160313Z frames [('total', 1)] 2025-12-04T10:35:20.8160409Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8160812Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8161013Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8161136Z graph_break [] 2025-12-04T10:35:20.8161419Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8161532Z Traceback (most recent call last): 2025-12-04T10:35:20.8161873Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8161997Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8162417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8162625Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8163071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8163234Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8163676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8163800Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8164254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8164531Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8165025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8165147Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8165561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8165664Z return self._compile_to_module() 2025-12-04T10:35:20.8166082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8166226Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8166661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8166815Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8167232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8167425Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8167928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8168032Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8168456Z File "/tmp/tmp65vf618w/xg/cxgcneslrqpl3d4g7il3uy432ce4emgcbefjpi7ub4gms3ms35mq.py", line 50, in 2025-12-04T10:35:20.8168899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8168989Z kernel.precompile( 2025-12-04T10:35:20.8169472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8169567Z self._precompile_worker() 2025-12-04T10:35:20.8170075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8170233Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8170738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8170915Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8171296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8171555Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8171939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8172224Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8172415Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8172689Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8172761Z ^ 2025-12-04T10:35:20.8173161Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8173166Z 2025-12-04T10:35:20.8173768Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8173776Z 2025-12-04T10:35:20.8173780Z 2025-12-04T10:35:20.8173963Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8174652Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8174658Z 2025-12-04T10:35:20.8174879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8175104Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8175187Z frames [('total', 1)] 2025-12-04T10:35:20.8175282Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8175691Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8175876Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8175964Z graph_break [] 2025-12-04T10:35:20.8176143Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8176223Z frames [('total', 1)] 2025-12-04T10:35:20.8176319Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8176542Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8176938Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8177022Z graph_break [] 2025-12-04T10:35:20.8177148Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8177429Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8177529Z Traceback (most recent call last): 2025-12-04T10:35:20.8177871Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8178001Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8178459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8178668Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8179156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8179315Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8179760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8179878Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8180328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8180695Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8181182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8181312Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8181718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8181818Z return self._compile_to_module() 2025-12-04T10:35:20.8182231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8182369Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8182813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8182921Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8183339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8183543Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8184041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8184148Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8184581Z File "/tmp/tmp70tlfw8j/j4/cj4kyrn55u2oea2ot3ifruisxtqfr3swourwuwqnhkj7pz74shex.py", line 50, in 2025-12-04T10:35:20.8185016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8185113Z kernel.precompile( 2025-12-04T10:35:20.8185583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8185678Z self._precompile_worker() 2025-12-04T10:35:20.8186191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8186349Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8186860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8187072Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8187459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8187674Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8188048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8188338Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8188533Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8188791Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8188913Z ^ 2025-12-04T10:35:20.8189301Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8189309Z 2025-12-04T10:35:20.8189914Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8189919Z 2025-12-04T10:35:20.8189927Z 2025-12-04T10:35:20.8190112Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8190802Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8190807Z 2025-12-04T10:35:20.8191035Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8191218Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8191342Z frames [('total', 1)] 2025-12-04T10:35:20.8191448Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8191850Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8192043Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8192125Z graph_break [] 2025-12-04T10:35:20.8192304Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8192390Z frames [('total', 1)] 2025-12-04T10:35:20.8192486Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8192671Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8193077Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8193155Z graph_break [] 2025-12-04T10:35:20.8193341Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8193426Z frames [('total', 1)] 2025-12-04T10:35:20.8193520Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8193710Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8194106Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8194187Z graph_break [] 2025-12-04T10:35:20.8194818Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.xml - 2025-12-04T10:35:20.8194966Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8195646Z FAILED [0.6128s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8195910Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8195985Z ^ 2025-12-04T10:35:20.8196376Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8196424Z 2025-12-04T10:35:20.8197025Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8197030Z 2025-12-04T10:35:20.8197034Z 2025-12-04T10:35:20.8197219Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8197903Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8197907Z 2025-12-04T10:35:20.8198136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8198329Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8198497Z ================== 1 failed, 187 deselected, 2 rerun in 3.40s ================== 2025-12-04T10:35:20.8198587Z Got exit code 1 2025-12-04T10:35:20.8198670Z Retrying single test... 2025-12-04T10:35:20.8199074Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.xml 2025-12-04T10:35:20.8199215Z ============================= test session starts ============================== 2025-12-04T10:35:20.8199507Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8199601Z cachedir: .pytest_cache 2025-12-04T10:35:20.8200053Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8200155Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8200246Z configfile: pytest.ini 2025-12-04T10:35:20.8200749Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8200938Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.8201559Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8201650Z Running 1 items in this shard 2025-12-04T10:35:20.8201655Z 2025-12-04T10:35:20.8202650Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8203286Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8203760Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8204241Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8204665Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8205093Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8205605Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8206094Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.8206475Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.8206957Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.8207380Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.8208132Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.8208572Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.8209019Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.8209483Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.8209875Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8211312Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8211773Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8212717Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8213267Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8214027Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8214617Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8215367Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8216082Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8216604Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8217300Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8217618Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8218377Z E1204 10:30:16.517000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8218491Z ('RERUN', {'yellow': True}) [2.1103s] [100%] 2025-12-04T10:35:20.8219531Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8220225Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8220694Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8221172Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8221608Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8222018Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8222530Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8222975Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.8223357Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.8223854Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.8224230Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.8224752Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.8225190Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.8225641Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.8226117Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.8226422Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8227869Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8228331Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8229263Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8229802Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8230561Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8231151Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8231952Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8232632Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8233153Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8233798Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8234151Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8234911Z E1204 10:30:17.151000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8235027Z ('RERUN', {'yellow': True}) [0.6017s] [100%] 2025-12-04T10:35:20.8236070Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8236773Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8237243Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8237724Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8238155Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8238530Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8239041Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8239476Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T10:35:20.8239868Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T10:35:20.8240354Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T10:35:20.8240729Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T10:35:20.8241263Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T10:35:20.8241699Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T10:35:20.8242153Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T10:35:20.8242618Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T10:35:20.8242919Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8244895Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8245352Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8246304Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8246885Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8247653Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8248232Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8248981Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8249693Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8250216Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8250861Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8251169Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8251938Z E1204 10:30:17.758000 90791 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8252020Z FAILED [0.6050s] [100%] 2025-12-04T10:35:20.8252027Z 2025-12-04T10:35:20.8252148Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8252434Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8252538Z Traceback (most recent call last): 2025-12-04T10:35:20.8252889Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8253070Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8253486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8253700Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8254135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8254298Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8254738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8254900Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8255369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8255646Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8256091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8256220Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8256625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8256729Z return self._compile_to_module() 2025-12-04T10:35:20.8257185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8257320Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8257766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8257872Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8258295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8258497Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8258997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8259155Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8259586Z File "/tmp/tmprjtwvcfc/el/cel3jf55yv3fdcxtist7gkgqhdy4whuzcb3wsuhdw7szv3qsvqe2.py", line 50, in 2025-12-04T10:35:20.8260109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8260212Z kernel.precompile( 2025-12-04T10:35:20.8260680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8260782Z self._precompile_worker() 2025-12-04T10:35:20.8261297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8261445Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8261955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8262121Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8262504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8262720Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8263093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8263385Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8263577Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8263883Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8263961Z ^ 2025-12-04T10:35:20.8264351Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8264356Z 2025-12-04T10:35:20.8264962Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8264972Z 2025-12-04T10:35:20.8264976Z 2025-12-04T10:35:20.8265153Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8265891Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8265896Z 2025-12-04T10:35:20.8266122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8266300Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8266386Z frames [('total', 1)] 2025-12-04T10:35:20.8266482Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8266886Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8267080Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8267196Z graph_break [] 2025-12-04T10:35:20.8267476Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8267577Z Traceback (most recent call last): 2025-12-04T10:35:20.8267915Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8268042Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8268453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8268658Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8269101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8269259Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8269738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8269860Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8270315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8270592Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8271035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8271156Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8271562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8271659Z return self._compile_to_module() 2025-12-04T10:35:20.8272071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8272208Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8272643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8272758Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8273178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8273373Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8273913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8274016Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8274450Z File "/tmp/tmpgifzhj15/jn/cjnekpnjz632m25jtpp77kcxngtyb7yx4vqxkyzhoqszpcrzjbox.py", line 50, in 2025-12-04T10:35:20.8274840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8274939Z kernel.precompile( 2025-12-04T10:35:20.8275409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8275548Z self._precompile_worker() 2025-12-04T10:35:20.8276111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8276258Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8276759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8276924Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8277304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8277582Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8277952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8278234Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8278430Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8278684Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8278751Z ^ 2025-12-04T10:35:20.8279140Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8279145Z 2025-12-04T10:35:20.8279749Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8279754Z 2025-12-04T10:35:20.8279760Z 2025-12-04T10:35:20.8279987Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8280675Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8280682Z 2025-12-04T10:35:20.8280907Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8281083Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8281165Z frames [('total', 1)] 2025-12-04T10:35:20.8281262Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8281660Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8281846Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8281928Z graph_break [] 2025-12-04T10:35:20.8282102Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8282190Z frames [('total', 1)] 2025-12-04T10:35:20.8282279Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8282460Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8282855Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8282931Z graph_break [] 2025-12-04T10:35:20.8283050Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8283372Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8283473Z Traceback (most recent call last): 2025-12-04T10:35:20.8283816Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8283938Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8284348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8284563Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8285041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8285206Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8285633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8285752Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8286206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8286474Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8286920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8287084Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8287485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8287585Z return self._compile_to_module() 2025-12-04T10:35:20.8287996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8288129Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8288565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8288668Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8289091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8289280Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8289824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8289932Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8290362Z File "/tmp/tmprlwjb0us/sr/csrc5rbquzx4tojj4mzmnf5qdhfus6dc6dgsqjki3e3xcrayg3gl.py", line 50, in 2025-12-04T10:35:20.8290758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8290847Z kernel.precompile( 2025-12-04T10:35:20.8291317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8291418Z self._precompile_worker() 2025-12-04T10:35:20.8291924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8292071Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8292578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8292745Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8293128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8293329Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8293750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8294034Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8294224Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8294481Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8294552Z ^ 2025-12-04T10:35:20.8294938Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8294983Z 2025-12-04T10:35:20.8295593Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8295599Z 2025-12-04T10:35:20.8295603Z 2025-12-04T10:35:20.8295803Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8296527Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8296532Z 2025-12-04T10:35:20.8296757Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8296934Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8297062Z frames [('total', 1)] 2025-12-04T10:35:20.8297152Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8297550Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8297735Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8297816Z graph_break [] 2025-12-04T10:35:20.8297997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8298083Z frames [('total', 1)] 2025-12-04T10:35:20.8298171Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8298353Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8298744Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8298826Z graph_break [] 2025-12-04T10:35:20.8299002Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8299203Z frames [('total', 1)] 2025-12-04T10:35:20.8299297Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8299480Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8299867Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8299948Z graph_break [] 2025-12-04T10:35:20.8300509Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.xml - 2025-12-04T10:35:20.8300655Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8301328Z FAILED [0.6050s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8301587Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8301666Z ^ 2025-12-04T10:35:20.8302052Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8302059Z 2025-12-04T10:35:20.8302661Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8302666Z 2025-12-04T10:35:20.8302670Z 2025-12-04T10:35:20.8302892Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8303575Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8303580Z 2025-12-04T10:35:20.8303802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8303954Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8304127Z ================== 1 failed, 187 deselected, 2 rerun in 3.35s ================== 2025-12-04T10:35:20.8304245Z Got exit code 1 2025-12-04T10:35:20.8304725Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8305084Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.8305481Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.xml 2025-12-04T10:35:20.8305619Z ============================= test session starts ============================== 2025-12-04T10:35:20.8305907Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8305998Z cachedir: .pytest_cache 2025-12-04T10:35:20.8306451Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8306595Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8306680Z configfile: pytest.ini 2025-12-04T10:35:20.8307143Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8307332Z collecting ... collected 188 items / 53 deselected / 135 selected 2025-12-04T10:35:20.8307448Z stepcurrent: skipping 53 already run items. 2025-12-04T10:35:20.8307538Z Running 135 items in this shard 2025-12-04T10:35:20.8307543Z 2025-12-04T10:35:20.8308184Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda PASSED [2.3574s] [ 0%] 2025-12-04T10:35:20.8308633Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6871s] [ 1%] 2025-12-04T10:35:20.8309689Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8310338Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8310803Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8311277Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8311704Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8312068Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8312533Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8312912Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8313392Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8313827Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8314306Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8314752Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8315218Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8315616Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8319533Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8320017Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8320914Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8321522Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8322292Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8322901Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8323656Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8324321Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8324840Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8325486Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8325793Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8326554Z E1204 10:30:28.722000 90973 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8326666Z ('RERUN', {'yellow': True}) [0.4334s] [ 2%] 2025-12-04T10:35:20.8327173Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [0.8295s] [ 2%] 2025-12-04T10:35:20.8327617Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda FAILED [0.8208s] [ 2%] 2025-12-04T10:35:20.8327623Z 2025-12-04T10:35:20.8327736Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8328049Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8328157Z Traceback (most recent call last): 2025-12-04T10:35:20.8328500Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8328627Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8329040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8329250Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8329732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8329895Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8330331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8330455Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8330990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8331265Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8331705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8331868Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8332278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8332377Z return self._compile_to_module() 2025-12-04T10:35:20.8332788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8332923Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8333359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8333470Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8333884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8334079Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8334578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8334681Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8335117Z File "/tmp/tmp4fmv9ce9/qk/cqkj37cahcu2akcpr46yuu6gzzggkumam7fyykhm7c7rru63cx3r.py", line 48, in 2025-12-04T10:35:20.8335506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8335593Z kernel.precompile( 2025-12-04T10:35:20.8336115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8336212Z self._precompile_worker() 2025-12-04T10:35:20.8336717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8336865Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8337368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8337535Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8337912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8338122Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8338536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8338820Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8339012Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8339335Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8339412Z ^ 2025-12-04T10:35:20.8339798Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8339849Z 2025-12-04T10:35:20.8340459Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8340464Z 2025-12-04T10:35:20.8340468Z 2025-12-04T10:35:20.8340658Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8341387Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8341392Z 2025-12-04T10:35:20.8341619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8341801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8341924Z frames [('total', 1)] 2025-12-04T10:35:20.8342021Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8342211Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8342409Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8342494Z graph_break [] 2025-12-04T10:35:20.8342757Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8342863Z Traceback (most recent call last): 2025-12-04T10:35:20.8343203Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8343323Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8343738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8343947Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8344384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8344546Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8344981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8345101Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8345559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8345824Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8346315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8346434Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8346846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8346945Z return self._compile_to_module() 2025-12-04T10:35:20.8347354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8347488Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8347929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8348080Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8348500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8348695Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8349196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8349303Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8349743Z File "/tmp/tmpow8jft64/jo/cjomr743uymceqrqlwtpvyhmrrneyirmtgtw24mimmkavebltcz5.py", line 80, in 2025-12-04T10:35:20.8350172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.8350264Z self._wait_futures(scope) 2025-12-04T10:35:20.8350694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.8350792Z kernel = result.result() 2025-12-04T10:35:20.8351163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.8351302Z return self.result_fn() 2025-12-04T10:35:20.8351705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.8351846Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.8352175Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.8352180Z 2025-12-04T10:35:20.8352287Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8352395Z Traceback (most recent call last): 2025-12-04T10:35:20.8352851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.8352927Z result = job() 2025-12-04T10:35:20.8353434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.8353549Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.8354018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.8354119Z self._precompile_worker() 2025-12-04T10:35:20.8354625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8354776Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8355281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8355442Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8355865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8356079Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8356456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8356733Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8356888Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8357148Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8357218Z ^ 2025-12-04T10:35:20.8357599Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8357611Z 2025-12-04T10:35:20.8357615Z 2025-12-04T10:35:20.8358292Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8358298Z 2025-12-04T10:35:20.8358301Z 2025-12-04T10:35:20.8358480Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8359160Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8359165Z 2025-12-04T10:35:20.8359386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8359570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8359695Z frames [('total', 1)] 2025-12-04T10:35:20.8359785Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8359973Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8360168Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8360247Z graph_break [] 2025-12-04T10:35:20.8360422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8360505Z frames [('total', 1)] 2025-12-04T10:35:20.8360598Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8360822Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8361122Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.8361244Z graph_break [] 2025-12-04T10:35:20.8361360Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8361624Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8361733Z Traceback (most recent call last): 2025-12-04T10:35:20.8362072Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8362199Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8362614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8362820Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8363258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8363417Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8363857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8363976Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8364427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8364699Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8365141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8365260Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8365673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8365771Z return self._compile_to_module() 2025-12-04T10:35:20.8366182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8366319Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8366753Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8366865Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8367284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8367527Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8368025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8368128Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8368567Z File "/tmp/tmp9wf8bfri/ij/cijkd6tblkaqxfxpirbwvg3pdzqg6fqsfv4argbcwvnotcrhihbq.py", line 80, in 2025-12-04T10:35:20.8368947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.8369038Z self._wait_futures(scope) 2025-12-04T10:35:20.8369501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.8369593Z kernel = result.result() 2025-12-04T10:35:20.8369973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.8370060Z return self.result_fn() 2025-12-04T10:35:20.8370464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.8370616Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.8370940Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.8370945Z 2025-12-04T10:35:20.8371053Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8371198Z Traceback (most recent call last): 2025-12-04T10:35:20.8371651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.8371734Z result = job() 2025-12-04T10:35:20.8372237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.8372349Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.8372823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.8372914Z self._precompile_worker() 2025-12-04T10:35:20.8373422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8373566Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8374073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8374237Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8374616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8374817Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8375196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8379543Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8379723Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8379988Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8380061Z ^ 2025-12-04T10:35:20.8380461Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8380470Z 2025-12-04T10:35:20.8380474Z 2025-12-04T10:35:20.8381080Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8381088Z 2025-12-04T10:35:20.8381092Z 2025-12-04T10:35:20.8381279Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8382026Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8382033Z 2025-12-04T10:35:20.8382266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8382448Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8382537Z frames [('total', 1)] 2025-12-04T10:35:20.8382643Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8382829Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8383025Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8383157Z graph_break [] 2025-12-04T10:35:20.8383331Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8383424Z frames [('total', 1)] 2025-12-04T10:35:20.8383522Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8383705Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8384016Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.8384100Z graph_break [] 2025-12-04T10:35:20.8384330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8384425Z frames [('total', 1)] 2025-12-04T10:35:20.8384518Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8384742Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8385047Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.8385127Z graph_break [] 2025-12-04T10:35:20.8385691Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.xml - 2025-12-04T10:35:20.8385835Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8386644Z FAILED [0.8208s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.8386658Z 2025-12-04T10:35:20.8386766Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8386867Z Traceback (most recent call last): 2025-12-04T10:35:20.8387350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.8387438Z result = job() 2025-12-04T10:35:20.8387948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.8388072Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.8388545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.8388650Z self._precompile_worker() 2025-12-04T10:35:20.8389160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8389312Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8389839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8390010Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8390390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8390601Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8390972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8391265Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8391464Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8391725Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8391810Z ^ 2025-12-04T10:35:20.8392194Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8392199Z 2025-12-04T10:35:20.8392205Z 2025-12-04T10:35:20.8392826Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8392871Z 2025-12-04T10:35:20.8392875Z 2025-12-04T10:35:20.8393057Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8393738Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8393750Z 2025-12-04T10:35:20.8393972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8394170Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8394361Z ============= 1 failed, 2 passed, 53 deselected, 2 rerun in 5.17s ============== 2025-12-04T10:35:20.8394443Z Got exit code 1 2025-12-04T10:35:20.8394536Z Retrying single test... 2025-12-04T10:35:20.8395017Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.xml 2025-12-04T10:35:20.8395152Z ============================= test session starts ============================== 2025-12-04T10:35:20.8395458Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8395547Z cachedir: .pytest_cache 2025-12-04T10:35:20.8396041Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8396153Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8396239Z configfile: pytest.ini 2025-12-04T10:35:20.8396699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8396895Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.8397501Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8397603Z Running 1 items in this shard 2025-12-04T10:35:20.8397610Z 2025-12-04T10:35:20.8398582Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8399226Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8399695Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8400173Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8400602Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8400975Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8401445Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8401867Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8402352Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8402729Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8403211Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8403663Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8404168Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8404476Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8405958Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8406451Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8407349Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8408134Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8408959Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8409705Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8410613Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8411279Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8411799Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8412442Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8412758Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8413523Z E1204 10:30:40.028000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8413637Z ('RERUN', {'yellow': True}) [2.2308s] [100%] 2025-12-04T10:35:20.8414729Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8415370Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8415883Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8416371Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8416852Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8417225Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8417687Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8418135Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8418627Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8419115Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8419611Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8420063Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8420533Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8420843Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8422264Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8422733Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8423628Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8424167Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8425018Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8425613Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8426419Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8427128Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8427702Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8428387Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8428768Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8429585Z E1204 10:30:40.609000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8429715Z ('RERUN', {'yellow': True}) [0.5482s] [100%] 2025-12-04T10:35:20.8430882Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8431519Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8432027Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8432507Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8432938Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8433302Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8433765Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8434144Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8434627Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8435010Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8435490Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8435989Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8436462Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8436764Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8438193Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8438698Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8439598Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8440129Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8440899Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8441521Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8442274Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8443008Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8443566Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8444215Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8444524Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8445297Z E1204 10:30:41.154000 91241 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8445383Z FAILED [0.5429s] [100%] 2025-12-04T10:35:20.8445390Z 2025-12-04T10:35:20.8445505Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8445775Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8445883Z Traceback (most recent call last): 2025-12-04T10:35:20.8446225Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8446357Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8446770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8446985Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8447422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8447583Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8448026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8448143Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8448604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8448875Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8449317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8449446Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8449896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8449994Z return self._compile_to_module() 2025-12-04T10:35:20.8450411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8450546Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8450987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8451100Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8451525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8451771Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8452268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8452378Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8452823Z File "/tmp/tmpvkc44bpz/bu/cbutcjz7uyiptowv62ao6jtzlcwuiuqbhwqfxlslc23cblleqf5s.py", line 48, in 2025-12-04T10:35:20.8453261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8453358Z kernel.precompile( 2025-12-04T10:35:20.8453834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8453968Z self._precompile_worker() 2025-12-04T10:35:20.8454485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8454636Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8455152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8455322Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8455707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8455934Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8456344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8456634Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8456827Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8457089Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8457168Z ^ 2025-12-04T10:35:20.8457556Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8457561Z 2025-12-04T10:35:20.8458179Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8458186Z 2025-12-04T10:35:20.8458190Z 2025-12-04T10:35:20.8458371Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8459154Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8459172Z 2025-12-04T10:35:20.8459401Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8459583Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8459672Z frames [('total', 1)] 2025-12-04T10:35:20.8459768Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8459972Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8460212Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8460296Z graph_break [] 2025-12-04T10:35:20.8460569Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8460675Z Traceback (most recent call last): 2025-12-04T10:35:20.8461015Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8461149Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8461561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8461816Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8462259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8462418Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8462854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8463017Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8463471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8463746Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8464227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8464350Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8464760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8464856Z return self._compile_to_module() 2025-12-04T10:35:20.8465273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8465406Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8465848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8465960Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8466377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8466587Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8467083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8467188Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8467632Z File "/tmp/tmpck8xdkiw/aq/caqur3gmqqhwrfsrqzn2p6dnz4e6sudvv6abescj6x3g2dbv7d5k.py", line 48, in 2025-12-04T10:35:20.8468032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8468122Z kernel.precompile( 2025-12-04T10:35:20.8468611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8468707Z self._precompile_worker() 2025-12-04T10:35:20.8469229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8469379Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8469885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8470053Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8470481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8470696Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8471069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8471350Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8471557Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8471815Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8471926Z ^ 2025-12-04T10:35:20.8472319Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8472324Z 2025-12-04T10:35:20.8472931Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8472938Z 2025-12-04T10:35:20.8472942Z 2025-12-04T10:35:20.8473128Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8473854Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8473860Z 2025-12-04T10:35:20.8474125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8474309Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8474395Z frames [('total', 1)] 2025-12-04T10:35:20.8474494Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8474697Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8474882Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8474965Z graph_break [] 2025-12-04T10:35:20.8475144Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8475230Z frames [('total', 1)] 2025-12-04T10:35:20.8475325Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8475511Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8475729Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8475816Z graph_break [] 2025-12-04T10:35:20.8475964Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8476238Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8476340Z Traceback (most recent call last): 2025-12-04T10:35:20.8476681Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8476808Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8477223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8477442Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8477884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8478040Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8478483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8478605Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8479070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8479338Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8479824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8479952Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8480360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8480471Z return self._compile_to_module() 2025-12-04T10:35:20.8480881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8481022Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8481475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8481653Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8482070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8482276Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8482775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8482925Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8483361Z File "/tmp/tmpnv2y0ye5/dz/cdzykcvs42m7nwvtemnrgul3qaliq57ujfwnmfo5bsrzb3pfu6r7.py", line 48, in 2025-12-04T10:35:20.8483759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8483899Z kernel.precompile( 2025-12-04T10:35:20.8484375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8484476Z self._precompile_worker() 2025-12-04T10:35:20.8484988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8485143Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8485657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8485826Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8486254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8486467Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8486839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8487125Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8487320Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8487575Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8487658Z ^ 2025-12-04T10:35:20.8488045Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8488050Z 2025-12-04T10:35:20.8488664Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8488668Z 2025-12-04T10:35:20.8488674Z 2025-12-04T10:35:20.8488858Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8489538Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8489551Z 2025-12-04T10:35:20.8489777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8489954Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8490088Z frames [('total', 1)] 2025-12-04T10:35:20.8490184Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8490384Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8490580Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8490664Z graph_break [] 2025-12-04T10:35:20.8490840Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8490936Z frames [('total', 1)] 2025-12-04T10:35:20.8491032Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8491220Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8491458Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8491535Z graph_break [] 2025-12-04T10:35:20.8491719Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8491804Z frames [('total', 1)] 2025-12-04T10:35:20.8491896Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8492086Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8492283Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8492412Z graph_break [] 2025-12-04T10:35:20.8492985Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.xml - 2025-12-04T10:35:20.8493130Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8493840Z FAILED [0.5429s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8494106Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8494174Z ^ 2025-12-04T10:35:20.8494576Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8494582Z 2025-12-04T10:35:20.8495187Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8495192Z 2025-12-04T10:35:20.8495196Z 2025-12-04T10:35:20.8495389Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8496076Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8496083Z 2025-12-04T10:35:20.8496317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8496468Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8496636Z ================== 1 failed, 187 deselected, 2 rerun in 3.36s ================== 2025-12-04T10:35:20.8496723Z Got exit code 1 2025-12-04T10:35:20.8496817Z Retrying single test... 2025-12-04T10:35:20.8497225Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.xml 2025-12-04T10:35:20.8497371Z ============================= test session starts ============================== 2025-12-04T10:35:20.8497675Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8497781Z cachedir: .pytest_cache 2025-12-04T10:35:20.8498230Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8498340Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8498439Z configfile: pytest.ini 2025-12-04T10:35:20.8498896Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8499130Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.8499800Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8499901Z Running 1 items in this shard 2025-12-04T10:35:20.8499905Z 2025-12-04T10:35:20.8500883Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8501536Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8502056Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8502538Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8503016Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8503401Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8503877Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8504322Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8504814Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8505198Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8505704Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8506203Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8506680Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8506993Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8508861Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8509334Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8510243Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8510791Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8511557Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8512228Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8512981Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8513651Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8514225Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8514865Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8515186Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8516058Z E1204 10:30:50.735000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8516168Z ('RERUN', {'yellow': True}) [2.1800s] [100%] 2025-12-04T10:35:20.8517186Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8517843Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8518310Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8518791Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8519213Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8519584Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8520046Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8520425Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8520918Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8521292Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8521778Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8522228Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8522691Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8523003Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8524498Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8525039Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8525940Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8526524Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8527284Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8527904Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8528664Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8529357Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8529886Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8530522Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8530834Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8531599Z E1204 10:30:51.313000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8531707Z ('RERUN', {'yellow': True}) [0.5462s] [100%] 2025-12-04T10:35:20.8532674Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8533309Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8533776Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8534259Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8534680Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8535052Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8535514Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8535959Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8536467Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8536836Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8537322Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8537770Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8538281Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8538583Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8540096Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8540599Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8541485Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8542021Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8542776Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8543358Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8544111Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8544772Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8545293Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8545973Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8546288Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8547049Z E1204 10:30:51.855000 91423 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8547142Z FAILED [0.5408s] [100%] 2025-12-04T10:35:20.8547146Z 2025-12-04T10:35:20.8547264Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8547572Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8547676Z Traceback (most recent call last): 2025-12-04T10:35:20.8548019Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8548140Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8548550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8548759Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8549196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8549395Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8549826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8549945Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8550396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8550713Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8551152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8551313Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8551716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8551816Z return self._compile_to_module() 2025-12-04T10:35:20.8552224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8552356Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8552793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8552903Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8553321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8553513Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8554015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8554116Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8554552Z File "/tmp/tmp9u3haqyo/q5/cq5ggv6tyjmwulg5umff2z5ftgbv4akghqktnjyylfhgqd7scnk3.py", line 48, in 2025-12-04T10:35:20.8554946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8555031Z kernel.precompile( 2025-12-04T10:35:20.8555506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8555600Z self._precompile_worker() 2025-12-04T10:35:20.8556115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8556261Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8556764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8556930Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8557310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8557512Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8557932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8558211Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8558409Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8558666Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8558735Z ^ 2025-12-04T10:35:20.8559125Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8559130Z 2025-12-04T10:35:20.8559736Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8559784Z 2025-12-04T10:35:20.8559788Z 2025-12-04T10:35:20.8559969Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8560649Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8560654Z 2025-12-04T10:35:20.8560926Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8561103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8561186Z frames [('total', 1)] 2025-12-04T10:35:20.8561345Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8561543Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8561730Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8561814Z graph_break [] 2025-12-04T10:35:20.8562079Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8562177Z Traceback (most recent call last): 2025-12-04T10:35:20.8562520Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8562639Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8563057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8563261Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8563695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8563864Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8564298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8564416Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8564864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8565135Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8565583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8565701Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8566109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8566205Z return self._compile_to_module() 2025-12-04T10:35:20.8566610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8566751Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8567187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8567291Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8567758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8567954Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8568458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8568561Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8568991Z File "/tmp/tmp6s2bahjm/sc/csciw34bkacuv2osa6cwp2teninfsi36h2hgq2muzcbxdy22dtxq.py", line 48, in 2025-12-04T10:35:20.8569383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8569511Z kernel.precompile( 2025-12-04T10:35:20.8569981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8570084Z self._precompile_worker() 2025-12-04T10:35:20.8570591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8570782Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8571287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8571492Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8571874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8572077Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8572449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8572730Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8572920Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8573179Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8573251Z ^ 2025-12-04T10:35:20.8573636Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8573645Z 2025-12-04T10:35:20.8574253Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8574261Z 2025-12-04T10:35:20.8574264Z 2025-12-04T10:35:20.8574439Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8575117Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8575122Z 2025-12-04T10:35:20.8575344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8575524Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8575608Z frames [('total', 1)] 2025-12-04T10:35:20.8575702Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8575904Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8576088Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8576167Z graph_break [] 2025-12-04T10:35:20.8576343Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8576423Z frames [('total', 1)] 2025-12-04T10:35:20.8576514Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8576692Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8576885Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8576963Z graph_break [] 2025-12-04T10:35:20.8577125Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8577390Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T10:35:20.8577499Z Traceback (most recent call last): 2025-12-04T10:35:20.8577835Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8577959Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8578366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8578612Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8579099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8579256Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8579689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8579808Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8580305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8580580Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8581058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8581184Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8581595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8581689Z return self._compile_to_module() 2025-12-04T10:35:20.8582112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8582243Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8582678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8582787Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8583203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8583402Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8583906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8584009Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8584440Z File "/tmp/tmp2kz9efe3/ft/cftx3qhvy2fdl5dt5qnijw2cononx6e2pf346dkblcuqnkzclkf3.py", line 48, in 2025-12-04T10:35:20.8584832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8584915Z kernel.precompile( 2025-12-04T10:35:20.8585398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8585492Z self._precompile_worker() 2025-12-04T10:35:20.8586047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8586195Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8586701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8586865Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8587240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8587489Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8587867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8588145Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8588337Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8588599Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8588671Z ^ 2025-12-04T10:35:20.8589104Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8589109Z 2025-12-04T10:35:20.8589708Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8589712Z 2025-12-04T10:35:20.8589719Z 2025-12-04T10:35:20.8589903Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8590617Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8590622Z 2025-12-04T10:35:20.8590846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8591062Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8591143Z frames [('total', 1)] 2025-12-04T10:35:20.8591242Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8591438Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8591622Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8591702Z graph_break [] 2025-12-04T10:35:20.8591881Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8591961Z frames [('total', 1)] 2025-12-04T10:35:20.8592055Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8592235Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8592437Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8592511Z graph_break [] 2025-12-04T10:35:20.8592683Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8592766Z frames [('total', 1)] 2025-12-04T10:35:20.8592855Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8593036Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8593229Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8593307Z graph_break [] 2025-12-04T10:35:20.8593862Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.xml - 2025-12-04T10:35:20.8594003Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8594664Z FAILED [0.5408s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8594924Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8594993Z ^ 2025-12-04T10:35:20.8595377Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8595386Z 2025-12-04T10:35:20.8595988Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8595993Z 2025-12-04T10:35:20.8595997Z 2025-12-04T10:35:20.8596174Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8596897Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8596902Z 2025-12-04T10:35:20.8597125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8597273Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8597441Z ================== 1 failed, 187 deselected, 2 rerun in 3.30s ================== 2025-12-04T10:35:20.8597518Z Got exit code 1 2025-12-04T10:35:20.8597985Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T10:35:20.8598460Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.8598862Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.xml 2025-12-04T10:35:20.8598995Z ============================= test session starts ============================== 2025-12-04T10:35:20.8599349Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8599443Z cachedir: .pytest_cache 2025-12-04T10:35:20.8599886Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8600030Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8600119Z configfile: pytest.ini 2025-12-04T10:35:20.8600575Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8600772Z collecting ... collected 188 items / 56 deselected / 132 selected 2025-12-04T10:35:20.8600886Z stepcurrent: skipping 56 already run items. 2025-12-04T10:35:20.8600976Z Running 132 items in this shard 2025-12-04T10:35:20.8600980Z 2025-12-04T10:35:20.8601983Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8602623Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8603091Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8603568Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8603991Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8604361Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8604825Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8605204Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8605685Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8606058Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8606540Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8607029Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8607499Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8608058Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8609495Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8610056Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8611070Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8611646Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8612514Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8613140Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8613948Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8614658Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8615216Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8615937Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8616293Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8617112Z E1204 10:31:01.541000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8617233Z ('RERUN', {'yellow': True}) [2.1125s] [ 0%] 2025-12-04T10:35:20.8618290Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8618982Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8619496Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8619976Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8620456Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8620824Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8621286Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8621667Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8622191Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8622564Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8623045Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8623535Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8623997Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8624345Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8625773Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8626235Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8627127Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8627663Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8628422Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8628999Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8629755Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8630413Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8630938Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8631574Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8631924Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8632689Z E1204 10:31:02.135000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8632796Z ('RERUN', {'yellow': True}) [0.5621s] [ 0%] 2025-12-04T10:35:20.8633786Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8634460Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8634924Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8635400Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8635857Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8636279Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8636780Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8637160Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8637639Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8638013Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8638496Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8638942Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8639409Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8639713Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8641137Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8641603Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8642491Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8643029Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8643827Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8644414Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8645163Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8645828Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8646464Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8647100Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8647451Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8648212Z E1204 10:31:02.704000 91605 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8648337Z FAILED [0.5672s] [ 0%] 2025-12-04T10:35:20.8648342Z 2025-12-04T10:35:20.8648456Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8648736Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8648838Z Traceback (most recent call last): 2025-12-04T10:35:20.8649174Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8649297Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8649707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8649916Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8650352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8650516Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8650952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8651071Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8651522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8651792Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8652233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8652353Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8652760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8652859Z return self._compile_to_module() 2025-12-04T10:35:20.8653271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8653403Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8653840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8657681Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8658194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8658398Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8658907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8659013Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8659524Z File "/tmp/tmpq0flfvi1/5w/c5w2caww7qw3cyy7psustrl2ltcfrwhgdetbf5cqse2ozgbeny5k.py", line 48, in 2025-12-04T10:35:20.8659924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8660059Z kernel.precompile( 2025-12-04T10:35:20.8660545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8660644Z self._precompile_worker() 2025-12-04T10:35:20.8661162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8661309Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8661856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8662036Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8662459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8662675Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8663050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8663332Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8663528Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8663785Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8663855Z ^ 2025-12-04T10:35:20.8664249Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8664256Z 2025-12-04T10:35:20.8664863Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8664871Z 2025-12-04T10:35:20.8664875Z 2025-12-04T10:35:20.8665066Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8665763Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8665768Z 2025-12-04T10:35:20.8666000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8666183Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8666269Z frames [('total', 1)] 2025-12-04T10:35:20.8666375Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8666577Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8666763Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8666854Z graph_break [] 2025-12-04T10:35:20.8667132Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8667241Z Traceback (most recent call last): 2025-12-04T10:35:20.8667584Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8667704Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8668122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8668376Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8668816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8668987Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8669420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8669544Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8669998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8670316Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8670762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8670882Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8671293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8671432Z return self._compile_to_module() 2025-12-04T10:35:20.8671841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8672017Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8672454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8672561Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8672985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8673182Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8673687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8673789Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8674226Z File "/tmp/tmph40w8cnw/pf/cpfiyr5f7fc3iejhwarchvpj45snfoajypagfyswrxkahe2jdhlh.py", line 48, in 2025-12-04T10:35:20.8674631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8674725Z kernel.precompile( 2025-12-04T10:35:20.8675204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8675301Z self._precompile_worker() 2025-12-04T10:35:20.8675831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8676008Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8676517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8676683Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8677066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8677273Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8677653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8677935Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8678129Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8678389Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8678457Z ^ 2025-12-04T10:35:20.8678898Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8678904Z 2025-12-04T10:35:20.8679514Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8679519Z 2025-12-04T10:35:20.8679523Z 2025-12-04T10:35:20.8679701Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8680397Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8680442Z 2025-12-04T10:35:20.8680665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8680846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8680930Z frames [('total', 1)] 2025-12-04T10:35:20.8681028Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8681234Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8681469Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8681556Z graph_break [] 2025-12-04T10:35:20.8681729Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8681810Z frames [('total', 1)] 2025-12-04T10:35:20.8681953Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8682135Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8682327Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8682411Z graph_break [] 2025-12-04T10:35:20.8682527Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8682805Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8682911Z Traceback (most recent call last): 2025-12-04T10:35:20.8683255Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8683384Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8683800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8684007Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8684452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8684616Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8685052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8685169Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8685623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8685905Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8686349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8686468Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8686880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8686982Z return self._compile_to_module() 2025-12-04T10:35:20.8687401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8687538Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8687972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8688153Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8688576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8688778Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8689274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8689380Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8689810Z File "/tmp/tmpj22crl5u/ls/cls5hdx2u5fkcjbxp6gkqmeb3atdj64t735b5h4xksok2uhuhdww.py", line 48, in 2025-12-04T10:35:20.8690243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8690334Z kernel.precompile( 2025-12-04T10:35:20.8690811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8690906Z self._precompile_worker() 2025-12-04T10:35:20.8691464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8691612Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8692115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8692323Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8692707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8692920Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8693404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8693786Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8694049Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8694392Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8694495Z ^ 2025-12-04T10:35:20.8694902Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8694910Z 2025-12-04T10:35:20.8695521Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8695530Z 2025-12-04T10:35:20.8695533Z 2025-12-04T10:35:20.8695723Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8696412Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8696417Z 2025-12-04T10:35:20.8696643Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8696823Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8696903Z frames [('total', 1)] 2025-12-04T10:35:20.8697001Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8697203Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8697386Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8697471Z graph_break [] 2025-12-04T10:35:20.8697646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8697736Z frames [('total', 1)] 2025-12-04T10:35:20.8697831Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8698015Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8698278Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8698359Z graph_break [] 2025-12-04T10:35:20.8698532Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8698623Z frames [('total', 1)] 2025-12-04T10:35:20.8698716Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8698895Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8699159Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8699242Z graph_break [] 2025-12-04T10:35:20.8699805Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.xml - 2025-12-04T10:35:20.8699993Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8700675Z FAILED [0.5672s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8700941Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8701009Z ^ 2025-12-04T10:35:20.8701452Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8701457Z 2025-12-04T10:35:20.8702059Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8702102Z 2025-12-04T10:35:20.8702106Z 2025-12-04T10:35:20.8702293Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8702990Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8702994Z 2025-12-04T10:35:20.8703218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8703377Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8703547Z ================== 1 failed, 56 deselected, 2 rerun in 3.28s =================== 2025-12-04T10:35:20.8703625Z Got exit code 1 2025-12-04T10:35:20.8703720Z Retrying single test... 2025-12-04T10:35:20.8704119Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.xml 2025-12-04T10:35:20.8704262Z ============================= test session starts ============================== 2025-12-04T10:35:20.8704558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8704651Z cachedir: .pytest_cache 2025-12-04T10:35:20.8705104Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8705207Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8705298Z configfile: pytest.ini 2025-12-04T10:35:20.8705768Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8705958Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.8706581Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8706675Z Running 1 items in this shard 2025-12-04T10:35:20.8706680Z 2025-12-04T10:35:20.8707918Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8708651Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8709120Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8709604Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8710029Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8710399Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8710920Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8711301Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8711791Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8712214Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8712708Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8713210Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8713673Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8713989Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8715433Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8715896Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8716797Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8717339Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8718096Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8718682Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8719437Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8720103Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8720679Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8721318Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8721632Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8722390Z E1204 10:31:12.310000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8722539Z ('RERUN', {'yellow': True}) [2.0963s] [100%] 2025-12-04T10:35:20.8723532Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8724204Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8724680Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8725227Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8725654Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8726023Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8726488Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8726869Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8727350Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8727733Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8728211Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8728665Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8729135Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8729438Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8730872Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8731331Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8732273Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8732808Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8733568Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8734156Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8734945Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8735613Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8736223Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8736869Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8737215Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8737994Z E1204 10:31:12.905000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8738103Z ('RERUN', {'yellow': True}) [0.5626s] [100%] 2025-12-04T10:35:20.8739136Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8739779Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8740243Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8740733Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8741153Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8741519Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8741993Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8742369Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8742863Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8743231Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8743715Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8744223Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8744689Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8744999Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8746426Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8746933Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8747866Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8748406Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8749200Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8749778Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8750537Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8751194Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8751722Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8752359Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8752677Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8753436Z E1204 10:31:13.476000 91787 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8753521Z FAILED [0.5700s] [100%] 2025-12-04T10:35:20.8753525Z 2025-12-04T10:35:20.8753651Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8753929Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8754039Z Traceback (most recent call last): 2025-12-04T10:35:20.8754385Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8754506Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8754932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8755138Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8755619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8755787Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8756221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8756343Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8756793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8757067Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8757555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8757677Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8758098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8758198Z return self._compile_to_module() 2025-12-04T10:35:20.8758651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8758795Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8759237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8759390Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8759817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8760014Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8760524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8760633Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8761080Z File "/tmp/tmpkcnluhrz/dq/cdqy4cnsahg4ljqdh4nqllbg7ybyholsnncxdr5ckvmm6yoaodzl.py", line 48, in 2025-12-04T10:35:20.8761490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8761584Z kernel.precompile( 2025-12-04T10:35:20.8762061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8762157Z self._precompile_worker() 2025-12-04T10:35:20.8762664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8762822Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8763333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8763504Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8763890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8764096Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8764477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8764764Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8764956Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8765226Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8765302Z ^ 2025-12-04T10:35:20.8765691Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8765704Z 2025-12-04T10:35:20.8766364Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8766370Z 2025-12-04T10:35:20.8766374Z 2025-12-04T10:35:20.8766560Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8767260Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8767267Z 2025-12-04T10:35:20.8767495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8767724Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8767809Z frames [('total', 1)] 2025-12-04T10:35:20.8767903Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8768118Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8768308Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8768385Z graph_break [] 2025-12-04T10:35:20.8768678Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8768851Z Traceback (most recent call last): 2025-12-04T10:35:20.8769196Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8769317Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8769773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8769992Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8770429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8770595Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8771030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8771151Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8771619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8771889Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8772335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8772466Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8772875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8772985Z return self._compile_to_module() 2025-12-04T10:35:20.8773395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8773528Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8773972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8774078Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8774506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8774701Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8775198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8775312Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8775748Z File "/tmp/tmp820zqs63/ic/cicqod7q7sby7qdsaeiquh7rxvgjs7ril7n3gxahaeewuobzbbus.py", line 48, in 2025-12-04T10:35:20.8776322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8776416Z kernel.precompile( 2025-12-04T10:35:20.8776891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8776989Z self._precompile_worker() 2025-12-04T10:35:20.8777493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8777642Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8778196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8778358Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8778741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8778951Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8779426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8779717Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8779911Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8780216Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8780296Z ^ 2025-12-04T10:35:20.8780684Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8780692Z 2025-12-04T10:35:20.8781304Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8781309Z 2025-12-04T10:35:20.8781315Z 2025-12-04T10:35:20.8781498Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8782202Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8782207Z 2025-12-04T10:35:20.8782434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8782618Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8782710Z frames [('total', 1)] 2025-12-04T10:35:20.8782810Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8783009Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8783199Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8783278Z graph_break [] 2025-12-04T10:35:20.8783459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8783546Z frames [('total', 1)] 2025-12-04T10:35:20.8783635Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8783831Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8784027Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8784105Z graph_break [] 2025-12-04T10:35:20.8784232Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8784515Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8784618Z Traceback (most recent call last): 2025-12-04T10:35:20.8784962Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8785082Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8785500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8785752Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8786242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8786407Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8786837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8786966Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8787414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8787724Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8788171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8788292Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8788711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8788848Z return self._compile_to_module() 2025-12-04T10:35:20.8789258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8789439Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8789873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8789980Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8790405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8790593Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8791096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8791199Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8791632Z File "/tmp/tmpjm0ifcgj/db/cdbjs3al7zf5ry4erh6tyo76syihbnegt37mrf3tcwqxlsjtr23n.py", line 48, in 2025-12-04T10:35:20.8792025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8792115Z kernel.precompile( 2025-12-04T10:35:20.8792591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8792684Z self._precompile_worker() 2025-12-04T10:35:20.8793188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8793340Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8793843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8794010Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8794397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8794599Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8794977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8795259Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8795447Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8795731Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8795815Z ^ 2025-12-04T10:35:20.8796253Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8796264Z 2025-12-04T10:35:20.8796960Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8796965Z 2025-12-04T10:35:20.8796969Z 2025-12-04T10:35:20.8797147Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8797839Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8797891Z 2025-12-04T10:35:20.8798111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8798293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8798372Z frames [('total', 1)] 2025-12-04T10:35:20.8798467Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8798673Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8798895Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8798974Z graph_break [] 2025-12-04T10:35:20.8799152Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8799236Z frames [('total', 1)] 2025-12-04T10:35:20.8799381Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8799558Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8799755Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8799836Z graph_break [] 2025-12-04T10:35:20.8800009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8800090Z frames [('total', 1)] 2025-12-04T10:35:20.8800182Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8800366Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8800562Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8800639Z graph_break [] 2025-12-04T10:35:20.8801194Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.xml - 2025-12-04T10:35:20.8801334Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8802011Z FAILED [0.5700s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8802272Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8802342Z ^ 2025-12-04T10:35:20.8802726Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8802730Z 2025-12-04T10:35:20.8803340Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8803345Z 2025-12-04T10:35:20.8803351Z 2025-12-04T10:35:20.8803528Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8804212Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8804222Z 2025-12-04T10:35:20.8804442Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8804590Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8804766Z ================== 1 failed, 187 deselected, 2 rerun in 3.26s ================== 2025-12-04T10:35:20.8804844Z Got exit code 1 2025-12-04T10:35:20.8804932Z Retrying single test... 2025-12-04T10:35:20.8805380Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.xml 2025-12-04T10:35:20.8805517Z ============================= test session starts ============================== 2025-12-04T10:35:20.8805812Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8805897Z cachedir: .pytest_cache 2025-12-04T10:35:20.8806346Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8806451Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8806605Z configfile: pytest.ini 2025-12-04T10:35:20.8807071Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8807255Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.8808038Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8808132Z Running 1 items in this shard 2025-12-04T10:35:20.8808207Z 2025-12-04T10:35:20.8809267Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8810014Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8810515Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8811029Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8811482Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8811875Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8812370Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8812772Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8813292Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8813693Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8814206Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8814687Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8815181Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8815515Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8817136Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8817599Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8818497Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8819073Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8819896Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8820473Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8821267Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8821928Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8822501Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8823148Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8823456Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8824225Z E1204 10:31:23.109000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8824331Z ('RERUN', {'yellow': True}) [2.1009s] [100%] 2025-12-04T10:35:20.8825318Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8825950Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8826411Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8826895Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8827318Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8827696Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8828156Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8828530Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8829062Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8829435Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8829924Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8830370Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8830841Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8831190Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8832655Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8833124Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8834052Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8834592Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8835348Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8835983Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8836735Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8837398Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8837916Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8838552Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8838863Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8839617Z E1204 10:31:23.707000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8839729Z ('RERUN', {'yellow': True}) [0.5651s] [100%] 2025-12-04T10:35:20.8840710Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T10:35:20.8841381Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8841849Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8842327Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8842750Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T10:35:20.8843153Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T10:35:20.8843616Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:20.8843990Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T10:35:20.8844513Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T10:35:20.8844895Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T10:35:20.8845373Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T10:35:20.8845866Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T10:35:20.8846380Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T10:35:20.8846679Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T10:35:20.8848106Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8848566Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T10:35:20.8849458Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8849989Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8850754Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8851328Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8852081Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8852734Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8853318Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:20.8853958Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8854262Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T10:35:20.8855025Z E1204 10:31:24.276000 91969 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8855146Z FAILED [0.5674s] [100%] 2025-12-04T10:35:20.8855150Z 2025-12-04T10:35:20.8855268Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8855547Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8855646Z Traceback (most recent call last): 2025-12-04T10:35:20.8855989Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8856155Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8856569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8856830Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8857261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8857421Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8857854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8857973Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8858438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8858708Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8859195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8859324Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8859729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8859832Z return self._compile_to_module() 2025-12-04T10:35:20.8860238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8860368Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8860808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8860911Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8861330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8861528Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8862025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8862129Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8862991Z File "/tmp/tmp4_w4wfhy/zg/czg6fy373v4kzjajvi2xsq24v33lkfuimgpa4bzhv4ehj4bx3ne2.py", line 48, in 2025-12-04T10:35:20.8863385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8863477Z kernel.precompile( 2025-12-04T10:35:20.8864044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8864144Z self._precompile_worker() 2025-12-04T10:35:20.8864653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8864802Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8865316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8865478Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8865943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8866158Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8866530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8866825Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8867056Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8867314Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8867392Z ^ 2025-12-04T10:35:20.8867819Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8867824Z 2025-12-04T10:35:20.8868433Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8868440Z 2025-12-04T10:35:20.8868444Z 2025-12-04T10:35:20.8868626Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8869323Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8869328Z 2025-12-04T10:35:20.8869549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8869727Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8869811Z frames [('total', 1)] 2025-12-04T10:35:20.8869906Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8870102Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8870287Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8870367Z graph_break [] 2025-12-04T10:35:20.8870647Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8870745Z Traceback (most recent call last): 2025-12-04T10:35:20.8871090Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8871217Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8871630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8871835Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8872503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8872675Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8873113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8873231Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8873681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8874007Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8874448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8874572Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8874976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8875075Z return self._compile_to_module() 2025-12-04T10:35:20.8875488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8875666Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8876099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8876210Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8876629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8876824Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8877360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8877463Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8877954Z File "/tmp/tmpoeoji2t9/pw/cpwckhent4frjpzfsvfcqqjiwoshhlizh2s5yaa5id7hop6iyw5b.py", line 48, in 2025-12-04T10:35:20.8878342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8878438Z kernel.precompile( 2025-12-04T10:35:20.8878907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8879000Z self._precompile_worker() 2025-12-04T10:35:20.8879518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8879664Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8880167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8880336Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8880710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8880915Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8881283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8881560Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8881759Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8882013Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8882085Z ^ 2025-12-04T10:35:20.8882471Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8882476Z 2025-12-04T10:35:20.8883078Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8883086Z 2025-12-04T10:35:20.8883091Z 2025-12-04T10:35:20.8883274Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8883958Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8883963Z 2025-12-04T10:35:20.8884242Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8884422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8884505Z frames [('total', 1)] 2025-12-04T10:35:20.8884604Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8884806Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8885005Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8885084Z graph_break [] 2025-12-04T10:35:20.8885259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8885387Z frames [('total', 1)] 2025-12-04T10:35:20.8885478Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8885677Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8885908Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8885986Z graph_break [] 2025-12-04T10:35:20.8886105Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8886385Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T10:35:20.8886529Z Traceback (most recent call last): 2025-12-04T10:35:20.8886883Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T10:35:20.8887069Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T10:35:20.8887479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8887694Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8888136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8888300Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8888743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8888860Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8889314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8889583Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8890028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8890152Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8890561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8890667Z return self._compile_to_module() 2025-12-04T10:35:20.8891082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8891214Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8891655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8891758Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8892181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8892377Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8892874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8892984Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8893413Z File "/tmp/tmp9088zri0/jt/cjtdplg4g3nym6jmjgsimctqrfgfr62t2lqpd6quku2hp6gzqcfz.py", line 48, in 2025-12-04T10:35:20.8893855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8893948Z kernel.precompile( 2025-12-04T10:35:20.8894422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8894521Z self._precompile_worker() 2025-12-04T10:35:20.8895025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8895176Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8895730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8895892Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8896275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8896485Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8896898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8897186Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8897376Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8897671Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8897746Z ^ 2025-12-04T10:35:20.8898132Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8898139Z 2025-12-04T10:35:20.8898750Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8898754Z 2025-12-04T10:35:20.8898761Z 2025-12-04T10:35:20.8898938Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8899688Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8899693Z 2025-12-04T10:35:20.8899915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8900095Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8900181Z frames [('total', 1)] 2025-12-04T10:35:20.8900278Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8900474Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8900660Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8900737Z graph_break [] 2025-12-04T10:35:20.8900920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8901002Z frames [('total', 1)] 2025-12-04T10:35:20.8901091Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8901282Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8901475Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8901550Z graph_break [] 2025-12-04T10:35:20.8901726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8901808Z frames [('total', 1)] 2025-12-04T10:35:20.8901895Z stats [('calls_captured', 8)] 2025-12-04T10:35:20.8902076Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T10:35:20.8902270Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8902350Z graph_break [] 2025-12-04T10:35:20.8902901Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.xml - 2025-12-04T10:35:20.8903089Z =========================== short test summary info ============================ 2025-12-04T10:35:20.8903775Z FAILED [0.5674s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:20.8904033Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8904109Z ^ 2025-12-04T10:35:20.8904495Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:20.8904500Z 2025-12-04T10:35:20.8905158Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8905163Z 2025-12-04T10:35:20.8905167Z 2025-12-04T10:35:20.8905346Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8906058Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8906063Z 2025-12-04T10:35:20.8906354Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8906503Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.8906712Z ================== 1 failed, 187 deselected, 2 rerun in 3.27s ================== 2025-12-04T10:35:20.8906793Z Got exit code 1 2025-12-04T10:35:20.8907267Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T10:35:20.8907628Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.8908188Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.xml 2025-12-04T10:35:20.8908328Z ============================= test session starts ============================== 2025-12-04T10:35:20.8908627Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.8908712Z cachedir: .pytest_cache 2025-12-04T10:35:20.8909171Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.8913189Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.8913282Z configfile: pytest.ini 2025-12-04T10:35:20.8913755Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.8913951Z collecting ... collected 188 items / 57 deselected / 131 selected 2025-12-04T10:35:20.8914073Z stepcurrent: skipping 57 already run items. 2025-12-04T10:35:20.8914180Z Running 131 items in this shard 2025-12-04T10:35:20.8914185Z 2025-12-04T10:35:20.8914629Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda PASSED [2.3500s] [ 0%] 2025-12-04T10:35:20.8915079Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6600s] [ 1%] 2025-12-04T10:35:20.8916071Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.8916723Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8917196Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8917770Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8918200Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.8918561Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.8919069Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8919517Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.8920009Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.8920453Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.8920873Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.8921395Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.8921861Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.8922218Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.8923774Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.8924230Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.8924968Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.8925398Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.8926114Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.8926722Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.8927444Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.8927877Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.8928599Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.8929151Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.8929930Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.8930643Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.8931357Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.8931996Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.8932721Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.8933367Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.8934129Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.8934468Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.8935050Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.8935346Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.8935796Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.8936735Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8937267Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8938023Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8938596Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8939419Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8940077Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8940598Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.8941250Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8941708Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8942235Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8942653Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.8943020Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.8943521Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8944005Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.8944359Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.8945060Z E1204 10:31:34.973000 92151 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.8945180Z ('RERUN', {'yellow': True}) [0.2030s] [ 2%] 2025-12-04T10:35:20.8945728Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6007s] [ 2%] 2025-12-04T10:35:20.8946160Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 FAILED [0.5834s] [ 2%] 2025-12-04T10:35:20.8946287Z 2025-12-04T10:35:20.8946416Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.8946691Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.8946800Z Traceback (most recent call last): 2025-12-04T10:35:20.8947110Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.8947217Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.8947641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8947849Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8948282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8948453Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8948883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8949016Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8949473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8949749Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8950197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8950319Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8950727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8950826Z return self._compile_to_module() 2025-12-04T10:35:20.8951243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8951385Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8951831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8951939Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8952413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8952613Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8953126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8953231Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8953650Z File "/tmp/tmp_dcbdldv/ft/cftymixla4jkfzzdywjnejngcxzvb4g2mbvpgv5nfcwizkzgc37a.py", line 51, in 2025-12-04T10:35:20.8954067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.8954199Z kernel.precompile( 2025-12-04T10:35:20.8954697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.8954798Z self._precompile_worker() 2025-12-04T10:35:20.8955313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8955481Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8956086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8956253Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8956690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8956905Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8957290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8957576Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8957773Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.8958063Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8958167Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8958296Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8958389Z xmask = xindex < xnumel 2025-12-04T10:35:20.8958467Z x0 = xindex 2025-12-04T10:35:20.8958614Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8958715Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.8958790Z ^ 2025-12-04T10:35:20.8959125Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.8959133Z 2025-12-04T10:35:20.8959746Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8959751Z 2025-12-04T10:35:20.8959754Z 2025-12-04T10:35:20.8959948Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8960633Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.8960638Z 2025-12-04T10:35:20.8960867Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8961054Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8961141Z frames [('total', 1)] 2025-12-04T10:35:20.8961240Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.8961432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.8961833Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8961923Z graph_break [] 2025-12-04T10:35:20.8962197Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.8962352Z Traceback (most recent call last): 2025-12-04T10:35:20.8962679Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.8962789Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.8963209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8963417Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8963853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8964066Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8964497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8964624Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8965081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8965394Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8965868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8966004Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8966469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8966585Z return self._compile_to_module() 2025-12-04T10:35:20.8966998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8967141Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8967582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8967691Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8968121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8968313Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8968815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8968924Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8969353Z File "/tmp/tmprx7n17gc/f2/cf2luty6z37x7p7zbbfqpubgqy747cohumtlkgt74sdrgi7tsfeo.py", line 83, in 2025-12-04T10:35:20.8969743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.8969834Z self._wait_futures(scope) 2025-12-04T10:35:20.8970259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.8970359Z kernel = result.result() 2025-12-04T10:35:20.8970795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.8970929Z return self.result_fn() 2025-12-04T10:35:20.8971454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.8971600Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.8972038Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.8972048Z 2025-12-04T10:35:20.8972188Z Name=triton_poi_fused__to_copy_0 2025-12-04T10:35:20.8972293Z Traceback (most recent call last): 2025-12-04T10:35:20.8972648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.8972743Z return fn(*args, **kwargs) 2025-12-04T10:35:20.8973148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.8973381Z return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.8973726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.8973818Z return fn(*args, **kwargs) 2025-12-04T10:35:20.8974159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.8974337Z return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.8974773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.8975097Z self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.8975442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.8975674Z return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.8976088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.8976301Z raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.8976684Z ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.8976729Z 2025-12-04T10:35:20.8976948Z The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.8976955Z 2025-12-04T10:35:20.8977060Z Traceback (most recent call last): 2025-12-04T10:35:20.8977515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.8977603Z result = job() 2025-12-04T10:35:20.8978106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.8978234Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.8978705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.8978799Z self._precompile_worker() 2025-12-04T10:35:20.8979393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.8979540Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.8980053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.8980215Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.8980596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.8980814Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.8981188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.8981469Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.8981635Z triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.8981899Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.8982007Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.8982119Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.8982204Z xmask = xindex < xnumel 2025-12-04T10:35:20.8982284Z x0 = xindex 2025-12-04T10:35:20.8982427Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.8982524Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.8982650Z ^ 2025-12-04T10:35:20.8982982Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.8982986Z 2025-12-04T10:35:20.8982990Z 2025-12-04T10:35:20.8983606Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.8983614Z 2025-12-04T10:35:20.8983618Z 2025-12-04T10:35:20.8983801Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.8984493Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.8984539Z 2025-12-04T10:35:20.8984766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.8985040Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8985138Z frames [('total', 1)] 2025-12-04T10:35:20.8985233Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.8985419Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.8985865Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.8985946Z graph_break [] 2025-12-04T10:35:20.8986178Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.8986264Z frames [('total', 1)] 2025-12-04T10:35:20.8986357Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.8986547Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.8987043Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.8987122Z graph_break [] 2025-12-04T10:35:20.8987253Z =================================== FAILURES =================================== 2025-12-04T10:35:20.8987528Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.8987642Z Traceback (most recent call last): 2025-12-04T10:35:20.8987954Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.8988057Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.8988483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.8988694Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.8989133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.8989297Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.8989729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.8989856Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.8990311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.8990579Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.8991029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.8991151Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.8991565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.8991664Z return self._compile_to_module() 2025-12-04T10:35:20.8992073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.8992260Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.8992702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.8992810Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.8993232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.8993430Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.8993939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.8994086Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.8994508Z File "/tmp/tmp8a4fclai/ud/cud2d6u6wlihn5nlz5bqo3u4g57uq3ib3xif7lvvtazv7l5l6as7.py", line 83, in 2025-12-04T10:35:20.8994902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T10:35:20.8994998Z self._wait_futures(scope) 2025-12-04T10:35:20.8995464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T10:35:20.8995562Z kernel = result.result() 2025-12-04T10:35:20.8995936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T10:35:20.8996073Z return self.result_fn() 2025-12-04T10:35:20.8996478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T10:35:20.8996589Z raise e.with_name(kernel_name) from e 2025-12-04T10:35:20.8996921Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.8996926Z 2025-12-04T10:35:20.8997023Z Name=triton_poi_fused__to_copy_0 2025-12-04T10:35:20.8997131Z Traceback (most recent call last): 2025-12-04T10:35:20.8997482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.8997575Z return fn(*args, **kwargs) 2025-12-04T10:35:20.8997914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.8998137Z return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.8998489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.8998581Z return fn(*args, **kwargs) 2025-12-04T10:35:20.8998923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.8999105Z return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.8999463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.8999790Z self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9000136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9000353Z return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9000702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9000911Z raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9001292Z ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9001298Z 2025-12-04T10:35:20.9001507Z The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9001511Z 2025-12-04T10:35:20.9001613Z Traceback (most recent call last): 2025-12-04T10:35:20.9002125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.9002205Z result = job() 2025-12-04T10:35:20.9002714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.9002841Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.9003314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.9003409Z self._precompile_worker() 2025-12-04T10:35:20.9003926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9004122Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9004637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9004804Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9005181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9005432Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9005838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9006187Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9006346Z triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9006614Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9006720Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9006831Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9006917Z xmask = xindex < xnumel 2025-12-04T10:35:20.9006999Z x0 = xindex 2025-12-04T10:35:20.9007143Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9007246Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9007322Z ^ 2025-12-04T10:35:20.9007654Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9007659Z 2025-12-04T10:35:20.9007663Z 2025-12-04T10:35:20.9008450Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9008459Z 2025-12-04T10:35:20.9008465Z 2025-12-04T10:35:20.9008651Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9009339Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9009344Z 2025-12-04T10:35:20.9009568Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9009745Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9009837Z frames [('total', 1)] 2025-12-04T10:35:20.9009928Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9010120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9010522Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9010606Z graph_break [] 2025-12-04T10:35:20.9010789Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9010877Z frames [('total', 1)] 2025-12-04T10:35:20.9010973Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9011163Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9011771Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.9011856Z graph_break [] 2025-12-04T10:35:20.9012035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9012114Z frames [('total', 1)] 2025-12-04T10:35:20.9012215Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9012396Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9012900Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T10:35:20.9013046Z graph_break [] 2025-12-04T10:35:20.9013605Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.xml - 2025-12-04T10:35:20.9013757Z =========================== short test summary info ============================ 2025-12-04T10:35:20.9014570Z FAILED [0.5834s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T10:35:20.9014575Z 2025-12-04T10:35:20.9014726Z Name=triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9014839Z Traceback (most recent call last): 2025-12-04T10:35:20.9015195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9015341Z return fn(*args, **kwargs) 2025-12-04T10:35:20.9015683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9015951Z return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9016315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9016407Z return fn(*args, **kwargs) 2025-12-04T10:35:20.9016748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9016927Z return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9017287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9017615Z self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9017957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9018173Z return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9018516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9018725Z raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9019154Z ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9019164Z 2025-12-04T10:35:20.9019371Z The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9019375Z 2025-12-04T10:35:20.9019480Z Traceback (most recent call last): 2025-12-04T10:35:20.9019941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T10:35:20.9020024Z result = job() 2025-12-04T10:35:20.9020531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T10:35:20.9020657Z kernel.precompile(warm_cache_only=True) 2025-12-04T10:35:20.9021131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T10:35:20.9021234Z self._precompile_worker() 2025-12-04T10:35:20.9021789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9021937Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9022452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9022615Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9023004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9023211Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9023627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9023917Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9024071Z triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9024341Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9024450Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9024628Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9024723Z xmask = xindex < xnumel 2025-12-04T10:35:20.9024797Z x0 = xindex 2025-12-04T10:35:20.9024940Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9025081Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9025154Z ^ 2025-12-04T10:35:20.9025485Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9025491Z 2025-12-04T10:35:20.9025495Z 2025-12-04T10:35:20.9026166Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9026170Z 2025-12-04T10:35:20.9026174Z 2025-12-04T10:35:20.9026357Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9027051Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9027056Z 2025-12-04T10:35:20.9027278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9027436Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.9027615Z ============= 1 failed, 2 passed, 57 deselected, 2 rerun in 4.43s ============== 2025-12-04T10:35:20.9027696Z Got exit code 1 2025-12-04T10:35:20.9027790Z Retrying single test... 2025-12-04T10:35:20.9028192Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.xml 2025-12-04T10:35:20.9028327Z ============================= test session starts ============================== 2025-12-04T10:35:20.9028628Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.9028718Z cachedir: .pytest_cache 2025-12-04T10:35:20.9029165Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.9029264Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.9029349Z configfile: pytest.ini 2025-12-04T10:35:20.9029817Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.9029998Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.9030605Z stepcurrent: skipping 59 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9030700Z Running 1 items in this shard 2025-12-04T10:35:20.9030704Z 2025-12-04T10:35:20.9031693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9032347Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9032804Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9033319Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9033729Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9034087Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9034637Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9035080Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9035545Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9036024Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9036449Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9036912Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9037368Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9037669Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9039204Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9039667Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9040392Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9040827Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9041526Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9042129Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9042897Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9043325Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9044038Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9044575Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9045351Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9046091Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9046842Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9047432Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9048214Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9048799Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9049547Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9049851Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9050423Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9050720Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9051174Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9052054Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9052591Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9053339Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9053918Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9054659Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9055352Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9055880Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9056516Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9056978Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9057490Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9057903Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9058266Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9058799Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9059336Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9059724Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9060421Z E1204 10:31:45.296000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9060529Z ('RERUN', {'yellow': True}) [1.7763s] [100%] 2025-12-04T10:35:20.9061469Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9062109Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9062563Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9063035Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9063447Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9063802Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9064301Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9064741Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9065166Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9065591Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9066013Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9066472Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9066971Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9067273Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9068801Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9069295Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9070021Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9070492Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9071190Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9071828Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9072552Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9072974Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9073691Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9074223Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9074957Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9075648Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9076360Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9076948Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9077656Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9078236Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9079026Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9079326Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9079898Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9080191Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9080646Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9081570Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9082103Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9082890Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9083463Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9084795Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9085449Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9086018Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9086661Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9087123Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9087591Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9088004Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9088362Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9088860Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9089305Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9089649Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9090345Z E1204 10:31:45.658000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9090457Z ('RERUN', {'yellow': True}) [0.3294s] [100%] 2025-12-04T10:35:20.9091516Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9092162Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9092616Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9093086Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9093538Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9093894Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9094406Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9094910Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9095337Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9095767Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9096270Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9096737Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9097195Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9097496Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9099064Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9099525Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9100253Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9100678Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9101384Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9101986Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9102710Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9103184Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9103903Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9104436Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9105171Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9105905Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9106626Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9107256Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9108191Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9108843Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9109598Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9109906Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9110479Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9110771Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9111223Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9112100Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9112641Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9113395Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9113966Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9114719Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9115373Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9116007Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9116647Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9117108Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9117579Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9118048Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9118412Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9118909Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9119494Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9119837Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9120574Z E1204 10:31:45.988000 92419 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9120661Z FAILED [0.3284s] [100%] 2025-12-04T10:35:20.9120666Z 2025-12-04T10:35:20.9120782Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.9121061Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9121160Z Traceback (most recent call last): 2025-12-04T10:35:20.9121475Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9121579Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9121993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9122200Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9122636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9122795Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9123227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9123343Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9123794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9124072Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9124514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9124643Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9125052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9125152Z return self._compile_to_module() 2025-12-04T10:35:20.9125577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9125730Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9126192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9126344Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9126762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9126959Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9127455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9127563Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9128001Z File "/tmp/tmptdhme109/dj/cdjqtqned3bxnvaezdbponghgy2hgmx6ssmso7ah2t3uhuqwyvdf.py", line 51, in 2025-12-04T10:35:20.9128434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9128526Z kernel.precompile( 2025-12-04T10:35:20.9128994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9129091Z self._precompile_worker() 2025-12-04T10:35:20.9129637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9129787Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9130303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9130506Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9130885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9131092Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9131460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9131746Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9131944Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9132207Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9132309Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9132419Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9132505Z xmask = xindex < xnumel 2025-12-04T10:35:20.9132584Z x0 = xindex 2025-12-04T10:35:20.9132722Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9132814Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9132893Z ^ 2025-12-04T10:35:20.9133217Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9133222Z 2025-12-04T10:35:20.9133835Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9133840Z 2025-12-04T10:35:20.9133843Z 2025-12-04T10:35:20.9134023Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9134709Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9134717Z 2025-12-04T10:35:20.9134937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9135113Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9135200Z frames [('total', 1)] 2025-12-04T10:35:20.9135294Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9135713Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9135932Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9136055Z graph_break [] 2025-12-04T10:35:20.9136328Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9136431Z Traceback (most recent call last): 2025-12-04T10:35:20.9136739Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9136842Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9137253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9137461Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9137967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9138123Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9138557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9138675Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9139214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9139491Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9139970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9140093Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9140500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9140595Z return self._compile_to_module() 2025-12-04T10:35:20.9141005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9141138Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9141572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9141682Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9142099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9142298Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9142791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9142896Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9143323Z File "/tmp/tmp81bduxbe/s7/cs7x4qn472xwe2oluebbwe3u4t7ouf26zc2w7ghsbc3fxfid6nzg.py", line 51, in 2025-12-04T10:35:20.9143716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9143807Z kernel.precompile( 2025-12-04T10:35:20.9144277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9144370Z self._precompile_worker() 2025-12-04T10:35:20.9144877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9145030Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9145531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9145696Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9146121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9146376Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9146748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9147031Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9147225Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9147488Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9147601Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9147710Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9147838Z xmask = xindex < xnumel 2025-12-04T10:35:20.9147913Z x0 = xindex 2025-12-04T10:35:20.9148051Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9148148Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9148226Z ^ 2025-12-04T10:35:20.9148553Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9148558Z 2025-12-04T10:35:20.9149209Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9149215Z 2025-12-04T10:35:20.9149219Z 2025-12-04T10:35:20.9149398Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9150114Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9150125Z 2025-12-04T10:35:20.9150349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9150529Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9150620Z frames [('total', 1)] 2025-12-04T10:35:20.9150715Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9151116Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9151310Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9151386Z graph_break [] 2025-12-04T10:35:20.9151564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9151649Z frames [('total', 1)] 2025-12-04T10:35:20.9151740Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9151927Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9152324Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9152402Z graph_break [] 2025-12-04T10:35:20.9152524Z =================================== FAILURES =================================== 2025-12-04T10:35:20.9152800Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9152897Z Traceback (most recent call last): 2025-12-04T10:35:20.9153210Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9153311Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9153729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9153937Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9154373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9154538Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9154971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9155091Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9155594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9155917Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9156365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9156485Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9156888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9157034Z return self._compile_to_module() 2025-12-04T10:35:20.9157445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9157579Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9158016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9158119Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9158582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9158773Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9159272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9159415Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9159848Z File "/tmp/tmpjt4eng62/4e/c4eda7ev2o4trituhdtmd7aogonniqxncantpa6hwz7nvoqvx2bq.py", line 51, in 2025-12-04T10:35:20.9160244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9160333Z kernel.precompile( 2025-12-04T10:35:20.9160810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9160902Z self._precompile_worker() 2025-12-04T10:35:20.9161405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9161552Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9162058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9162220Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9162604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9162804Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9163188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9163468Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9163663Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9163935Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9164030Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9164142Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9164228Z xmask = xindex < xnumel 2025-12-04T10:35:20.9164302Z x0 = xindex 2025-12-04T10:35:20.9164437Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9164534Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9164601Z ^ 2025-12-04T10:35:20.9164930Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9164935Z 2025-12-04T10:35:20.9165582Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9165588Z 2025-12-04T10:35:20.9165591Z 2025-12-04T10:35:20.9165778Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9166456Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9166464Z 2025-12-04T10:35:20.9166683Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9166908Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9166990Z frames [('total', 1)] 2025-12-04T10:35:20.9167080Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9167478Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9167658Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9167737Z graph_break [] 2025-12-04T10:35:20.9167950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9168028Z frames [('total', 1)] 2025-12-04T10:35:20.9168121Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9168301Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9168730Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9168813Z graph_break [] 2025-12-04T10:35:20.9168988Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9169071Z frames [('total', 1)] 2025-12-04T10:35:20.9169160Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9169338Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9169729Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9169804Z graph_break [] 2025-12-04T10:35:20.9170361Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.xml - 2025-12-04T10:35:20.9170504Z =========================== short test summary info ============================ 2025-12-04T10:35:20.9171167Z FAILED [0.3284s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9171442Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9171539Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9175402Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9175510Z xmask = xindex < xnumel 2025-12-04T10:35:20.9175599Z x0 = xindex 2025-12-04T10:35:20.9175746Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9175856Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9175934Z ^ 2025-12-04T10:35:20.9176284Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9176289Z 2025-12-04T10:35:20.9176897Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9176906Z 2025-12-04T10:35:20.9176912Z 2025-12-04T10:35:20.9177101Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9177796Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9177802Z 2025-12-04T10:35:20.9178120Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9178282Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.9178456Z ================== 1 failed, 187 deselected, 2 rerun in 2.47s ================== 2025-12-04T10:35:20.9178537Z Got exit code 1 2025-12-04T10:35:20.9178633Z Retrying single test... 2025-12-04T10:35:20.9179095Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.xml 2025-12-04T10:35:20.9179245Z ============================= test session starts ============================== 2025-12-04T10:35:20.9179589Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.9179679Z cachedir: .pytest_cache 2025-12-04T10:35:20.9180128Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.9180235Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.9180331Z configfile: pytest.ini 2025-12-04T10:35:20.9180840Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.9181034Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.9181657Z stepcurrent: skipping 59 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9181794Z Running 1 items in this shard 2025-12-04T10:35:20.9181799Z 2025-12-04T10:35:20.9182741Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9183392Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9183849Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9184323Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9184743Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9185107Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9185609Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9186050Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9186478Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9186908Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9187340Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9187801Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9188263Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9188567Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9190154Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9190618Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9191386Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9191822Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9192563Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9193161Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9193925Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9194352Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9195075Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9195613Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9196352Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9197045Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9197760Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9198360Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9199069Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9199652Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9200405Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9200713Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9201328Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9201628Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9202083Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9202964Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9203550Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9204298Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9204930Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9205677Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9206422Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9206956Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9207596Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9208218Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9208698Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9209122Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9209485Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9209988Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9210439Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9210784Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9211486Z E1204 10:31:55.885000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9211596Z ('RERUN', {'yellow': True}) [1.7930s] [100%] 2025-12-04T10:35:20.9212538Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9213190Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9213726Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9214204Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9214614Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9214981Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9215543Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9216034Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9216469Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9216950Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9217381Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9217933Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9218392Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9218704Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9220276Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9220739Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9221470Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9221905Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9222606Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9223207Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9223933Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9224367Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9225087Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9225674Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9226421Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9227112Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9227873Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9228473Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9229226Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9229817Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9230606Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9230921Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9231503Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9231806Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9232266Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9233150Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9233692Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9234447Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9235031Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9235802Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9236486Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9237011Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9237695Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9238168Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9238642Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9239064Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9239426Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9239968Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9240426Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9240777Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9241523Z E1204 10:31:56.253000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9241632Z ('RERUN', {'yellow': True}) [0.3347s] [100%] 2025-12-04T10:35:20.9242618Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9243270Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9243732Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9244220Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9244639Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9245012Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9245512Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9245986Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9246452Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9246887Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9247322Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9247782Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9248243Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9248566Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9250144Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9250608Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9251349Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9251832Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9252540Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9253182Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9253909Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9254387Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9255118Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9255657Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9256455Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9257158Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9257883Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9258484Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9259262Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9259872Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9260625Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9260945Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9261520Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9261876Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9262338Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9263222Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9263775Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9264576Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9265164Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9265977Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9266631Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9267199Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9267845Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9268317Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9268790Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9269213Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9269575Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9270074Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9270527Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9270876Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9271756Z E1204 10:31:56.583000 92600 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9271884Z FAILED [0.3277s] [100%] 2025-12-04T10:35:20.9271891Z 2025-12-04T10:35:20.9272056Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.9272435Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9272577Z Traceback (most recent call last): 2025-12-04T10:35:20.9272912Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9273017Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9273436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9273721Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9274162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9274328Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9274768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9274895Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9275354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9275674Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9276113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9276249Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9276655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9276800Z return self._compile_to_module() 2025-12-04T10:35:20.9277214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9277351Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9277840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9277951Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9278369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9278572Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9279073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9279185Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9279625Z File "/tmp/tmpz27mbrio/fl/cfldzd5p5gcnn4h4wlqf2xbzc7442kpbkzeklebfxclchaxpg74g.py", line 51, in 2025-12-04T10:35:20.9280020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9280119Z kernel.precompile( 2025-12-04T10:35:20.9280598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9280707Z self._precompile_worker() 2025-12-04T10:35:20.9281212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9281360Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9281875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9282046Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9282428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9282637Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9283009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9283300Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9283497Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9283761Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9283877Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9284040Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9284126Z xmask = xindex < xnumel 2025-12-04T10:35:20.9284207Z x0 = xindex 2025-12-04T10:35:20.9284347Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9284450Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9284523Z ^ 2025-12-04T10:35:20.9284849Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9284857Z 2025-12-04T10:35:20.9285570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9285620Z 2025-12-04T10:35:20.9285624Z 2025-12-04T10:35:20.9285812Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9286504Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9286509Z 2025-12-04T10:35:20.9286731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9286955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9287042Z frames [('total', 1)] 2025-12-04T10:35:20.9287138Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9287586Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9287772Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9287853Z graph_break [] 2025-12-04T10:35:20.9288131Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9288235Z Traceback (most recent call last): 2025-12-04T10:35:20.9288546Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9288655Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9289071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9289291Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9289726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9289889Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9290331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9290449Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9290902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9291183Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9291622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9291751Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9292158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9292260Z return self._compile_to_module() 2025-12-04T10:35:20.9292673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9292806Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9293252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9293359Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9293825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9294028Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9294524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9294635Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9295061Z File "/tmp/tmp6qpep_xi/ck/cckdvde4aeuiapbenhrjxpbdqh3fxkid3huwmrfblucw5ztfub3w.py", line 51, in 2025-12-04T10:35:20.9295459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9295677Z kernel.precompile( 2025-12-04T10:35:20.9296152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9296245Z self._precompile_worker() 2025-12-04T10:35:20.9296766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9296912Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9297467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9297635Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9298056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9298271Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9298650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9298946Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9299203Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9299480Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9299591Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9299706Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9299792Z xmask = xindex < xnumel 2025-12-04T10:35:20.9299872Z x0 = xindex 2025-12-04T10:35:20.9300017Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9300115Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9300193Z ^ 2025-12-04T10:35:20.9300519Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9300527Z 2025-12-04T10:35:20.9301145Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9301149Z 2025-12-04T10:35:20.9301153Z 2025-12-04T10:35:20.9301341Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9302029Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9302034Z 2025-12-04T10:35:20.9302262Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9302443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9302535Z frames [('total', 1)] 2025-12-04T10:35:20.9302638Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9303035Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9303227Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9303309Z graph_break [] 2025-12-04T10:35:20.9303550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9303637Z frames [('total', 1)] 2025-12-04T10:35:20.9303730Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9303922Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9304317Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9304400Z graph_break [] 2025-12-04T10:35:20.9304528Z =================================== FAILURES =================================== 2025-12-04T10:35:20.9304804Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9304983Z Traceback (most recent call last): 2025-12-04T10:35:20.9305301Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9305406Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9305833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9306040Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9306519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9306683Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9307151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9307279Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9307906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9308183Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9308635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9308757Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9309181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9309285Z return self._compile_to_module() 2025-12-04T10:35:20.9309706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9309859Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9310301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9310414Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9310846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9311053Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9311576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9311686Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9312116Z File "/tmp/tmpgl4yqbld/w7/cw7v5yat3t2pzgsrj42o4p7sqcxta2w4ftba72uhnfptyzjfzn5d.py", line 51, in 2025-12-04T10:35:20.9312525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9312620Z kernel.precompile( 2025-12-04T10:35:20.9313105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9313203Z self._precompile_worker() 2025-12-04T10:35:20.9313711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9313949Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9314462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9314627Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9315013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9315221Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9315617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9315961Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9316164Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9316442Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9316542Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9316667Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9316814Z xmask = xindex < xnumel 2025-12-04T10:35:20.9316894Z x0 = xindex 2025-12-04T10:35:20.9317041Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9317139Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9317267Z ^ 2025-12-04T10:35:20.9317605Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9317610Z 2025-12-04T10:35:20.9318225Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9318230Z 2025-12-04T10:35:20.9318234Z 2025-12-04T10:35:20.9318420Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9319114Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9319119Z 2025-12-04T10:35:20.9319353Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9319544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9319632Z frames [('total', 1)] 2025-12-04T10:35:20.9319738Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9320130Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9320324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9320409Z graph_break [] 2025-12-04T10:35:20.9320584Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9320668Z frames [('total', 1)] 2025-12-04T10:35:20.9320776Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9320956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9321354Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9321440Z graph_break [] 2025-12-04T10:35:20.9321614Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9321702Z frames [('total', 1)] 2025-12-04T10:35:20.9321793Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9321972Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9322366Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9322446Z graph_break [] 2025-12-04T10:35:20.9323003Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.xml - 2025-12-04T10:35:20.9323193Z =========================== short test summary info ============================ 2025-12-04T10:35:20.9323864Z FAILED [0.3277s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9324139Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9324245Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9324355Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9324443Z xmask = xindex < xnumel 2025-12-04T10:35:20.9324561Z x0 = xindex 2025-12-04T10:35:20.9324704Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9324800Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9324870Z ^ 2025-12-04T10:35:20.9325204Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9325211Z 2025-12-04T10:35:20.9325852Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9325857Z 2025-12-04T10:35:20.9325861Z 2025-12-04T10:35:20.9326041Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9326717Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9326760Z 2025-12-04T10:35:20.9326990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9327143Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.9327308Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ================== 2025-12-04T10:35:20.9327392Z Got exit code 1 2025-12-04T10:35:20.9327861Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9328213Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.9328620Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.xml 2025-12-04T10:35:20.9328758Z ============================= test session starts ============================== 2025-12-04T10:35:20.9329046Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.9329142Z cachedir: .pytest_cache 2025-12-04T10:35:20.9329584Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.9329687Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.9329772Z configfile: pytest.ini 2025-12-04T10:35:20.9330230Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.9330424Z collecting ... collected 188 items / 60 deselected / 128 selected 2025-12-04T10:35:20.9330540Z stepcurrent: skipping 60 already run items. 2025-12-04T10:35:20.9330630Z Running 128 items in this shard 2025-12-04T10:35:20.9330638Z 2025-12-04T10:35:20.9331602Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9332245Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9332751Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9333223Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9333641Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9333999Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9334500Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9334983Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9335404Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9335839Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9336351Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9336810Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9337314Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9337613Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9339209Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9339661Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9340392Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9340818Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9341522Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9342121Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9342837Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9343265Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9343981Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9344593Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9345326Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9346020Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9346728Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9347356Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9348067Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9348683Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9349435Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9349771Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9350347Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9350640Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9351088Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9351971Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9352503Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9353257Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9353827Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9354573Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9355221Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9355740Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9356382Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9356881Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9357362Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9357774Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9358130Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9358635Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9359114Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9359459Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9360153Z E1204 10:32:06.456000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9360302Z ('RERUN', {'yellow': True}) [1.7902s] [ 0%] 2025-12-04T10:35:20.9361259Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9361938Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9362397Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9362865Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9363281Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9363638Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9364137Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9364578Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9365000Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9365433Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9365932Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9366448Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9366908Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9367205Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9368786Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9369239Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9369965Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9370388Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9371129Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9371728Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9372484Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9372947Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9373658Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9374198Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9374927Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9375620Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9376379Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9376964Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9377677Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9378255Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9379008Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9379350Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9379928Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9380220Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9380714Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9381603Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9382130Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9382880Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9383494Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9384241Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9384938Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9385519Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9386212Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9386666Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9387141Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9387555Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9387910Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9388412Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9388851Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9389196Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9389896Z E1204 10:32:06.818000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9390002Z ('RERUN', {'yellow': True}) [0.3295s] [ 0%] 2025-12-04T10:35:20.9390965Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9391602Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9392063Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9392572Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9392988Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9393347Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9393843Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9394284Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9394750Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9395180Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9395605Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9396151Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9396609Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9396942Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9398479Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9398932Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9399657Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9400081Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9400787Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9401384Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9402101Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9402524Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9403233Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9403771Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9404544Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9405237Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9405946Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9406533Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9407287Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9408035Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9408856Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9409230Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9409847Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9410165Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9410645Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9411596Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9412163Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9412978Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9413592Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9414395Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9415095Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9415651Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9416341Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9416830Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9417401Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9417818Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9418173Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9418672Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9419156Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9419561Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9420253Z E1204 10:32:07.146000 92781 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9420335Z FAILED [0.3267s] [ 0%] 2025-12-04T10:35:20.9420344Z 2025-12-04T10:35:20.9420500Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.9420783Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9420886Z Traceback (most recent call last): 2025-12-04T10:35:20.9421233Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9421339Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9421761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9421967Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9422403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9422562Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9422994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9423112Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9423562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9423833Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9424280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9424401Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9424808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9424905Z return self._compile_to_module() 2025-12-04T10:35:20.9425315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9425454Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9425889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9426000Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9426415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9426612Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9427108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9427207Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9427686Z File "/tmp/tmps26qny1n/ad/cadav5mroc2gis34vnfqicpsaedrnx7sybmi2gcwceto4d5kskxj.py", line 51, in 2025-12-04T10:35:20.9428085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9428171Z kernel.precompile( 2025-12-04T10:35:20.9428643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9428737Z self._precompile_worker() 2025-12-04T10:35:20.9429240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9429430Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9429931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9430096Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9430478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9430745Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9431118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9431397Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9431634Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9431900Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9432001Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9432119Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9432203Z xmask = xindex < xnumel 2025-12-04T10:35:20.9432275Z x0 = xindex 2025-12-04T10:35:20.9432414Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9432510Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9432581Z ^ 2025-12-04T10:35:20.9432911Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9432916Z 2025-12-04T10:35:20.9433523Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9433531Z 2025-12-04T10:35:20.9433535Z 2025-12-04T10:35:20.9433715Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9434407Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9434412Z 2025-12-04T10:35:20.9434640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9434819Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9434902Z frames [('total', 1)] 2025-12-04T10:35:20.9434995Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9435397Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9435579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9435661Z graph_break [] 2025-12-04T10:35:20.9435977Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9436089Z Traceback (most recent call last): 2025-12-04T10:35:20.9436397Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9436497Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9436909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9437159Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9437596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9437759Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9438185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9438312Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9438762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9439077Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9439520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9439641Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9440046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9440182Z return self._compile_to_module() 2025-12-04T10:35:20.9440595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9440769Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9441203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9441311Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9441730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9441920Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9442420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9442526Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9442947Z File "/tmp/tmpm6h_b101/xc/cxctxutdvenp3i3aoeg7ligi7mcyhn4myizgftgokhg7dgz6oocf.py", line 51, in 2025-12-04T10:35:20.9443339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9443431Z kernel.precompile( 2025-12-04T10:35:20.9443908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9444004Z self._precompile_worker() 2025-12-04T10:35:20.9444506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9444653Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9445157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9445323Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9445706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9445909Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9446333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9446615Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9446806Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9447070Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9447214Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9447334Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9447418Z xmask = xindex < xnumel 2025-12-04T10:35:20.9447491Z x0 = xindex 2025-12-04T10:35:20.9447632Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9447726Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9447796Z ^ 2025-12-04T10:35:20.9448122Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9448128Z 2025-12-04T10:35:20.9448730Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9448779Z 2025-12-04T10:35:20.9448783Z 2025-12-04T10:35:20.9448966Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9449657Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9449662Z 2025-12-04T10:35:20.9449922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9450103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9450183Z frames [('total', 1)] 2025-12-04T10:35:20.9450324Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9450722Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9450908Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9450987Z graph_break [] 2025-12-04T10:35:20.9451160Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9451240Z frames [('total', 1)] 2025-12-04T10:35:20.9451340Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9451520Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9451914Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9451994Z graph_break [] 2025-12-04T10:35:20.9452110Z =================================== FAILURES =================================== 2025-12-04T10:35:20.9452397Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9452497Z Traceback (most recent call last): 2025-12-04T10:35:20.9452805Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9452913Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9453328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9453537Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9453974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9454130Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9454564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9454679Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9455128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9455401Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9455837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9455958Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9456409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9456506Z return self._compile_to_module() 2025-12-04T10:35:20.9456917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9457048Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9457484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9457590Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9458050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9458241Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9458734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9458840Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9459316Z File "/tmp/tmpif5dtou7/lx/clx56sov4ibdgkhsbhibfga667qsrwikk7i4vjlko3qqsg4u4sje.py", line 51, in 2025-12-04T10:35:20.9459751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9459842Z kernel.precompile( 2025-12-04T10:35:20.9460350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9460440Z self._precompile_worker() 2025-12-04T10:35:20.9460955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9461100Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9461607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9461769Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9462147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9462353Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9462725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9463010Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9463204Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9463466Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9463565Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9463674Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9463759Z xmask = xindex < xnumel 2025-12-04T10:35:20.9463839Z x0 = xindex 2025-12-04T10:35:20.9463971Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9464064Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9467831Z ^ 2025-12-04T10:35:20.9468175Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9468180Z 2025-12-04T10:35:20.9468806Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9468814Z 2025-12-04T10:35:20.9468818Z 2025-12-04T10:35:20.9469000Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9469700Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9469706Z 2025-12-04T10:35:20.9470076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9470260Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9470354Z frames [('total', 1)] 2025-12-04T10:35:20.9470451Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9470851Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9471050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9471134Z graph_break [] 2025-12-04T10:35:20.9471392Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9471480Z frames [('total', 1)] 2025-12-04T10:35:20.9471574Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9471762Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9472161Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9472241Z graph_break [] 2025-12-04T10:35:20.9472463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9472551Z frames [('total', 1)] 2025-12-04T10:35:20.9472649Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9472834Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9473266Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9473359Z graph_break [] 2025-12-04T10:35:20.9473917Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.xml - 2025-12-04T10:35:20.9474059Z =========================== short test summary info ============================ 2025-12-04T10:35:20.9474756Z FAILED [0.3267s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9475027Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9475139Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9475256Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9475345Z xmask = xindex < xnumel 2025-12-04T10:35:20.9475435Z x0 = xindex 2025-12-04T10:35:20.9475578Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9475676Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9475782Z ^ 2025-12-04T10:35:20.9476141Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9476146Z 2025-12-04T10:35:20.9476761Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9476766Z 2025-12-04T10:35:20.9476770Z 2025-12-04T10:35:20.9476952Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9477648Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9477657Z 2025-12-04T10:35:20.9477883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9478037Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.9478213Z ================== 1 failed, 60 deselected, 2 rerun in 2.48s =================== 2025-12-04T10:35:20.9478293Z Got exit code 1 2025-12-04T10:35:20.9478380Z Retrying single test... 2025-12-04T10:35:20.9478795Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.xml 2025-12-04T10:35:20.9478973Z ============================= test session starts ============================== 2025-12-04T10:35:20.9479270Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.9479360Z cachedir: .pytest_cache 2025-12-04T10:35:20.9479807Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.9479915Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.9480000Z configfile: pytest.ini 2025-12-04T10:35:20.9480466Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.9480698Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.9481317Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9481418Z Running 1 items in this shard 2025-12-04T10:35:20.9481423Z 2025-12-04T10:35:20.9482430Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9483083Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9483587Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9484066Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9484493Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9484852Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9485361Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9485830Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9486280Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9486716Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9487141Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9487610Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9488074Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9488371Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9489917Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9490415Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9491156Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9491582Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9492295Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9492936Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9493663Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9494122Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9494837Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9495412Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9496197Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9496896Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9497608Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9498205Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9498914Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9499555Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9500317Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9500620Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9501202Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9501503Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9501948Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9502889Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9503423Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9504187Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9504766Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9505559Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9506261Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9506829Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9507467Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9508131Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9508612Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9509031Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9509399Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9509896Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9510347Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9510702Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9511405Z E1204 10:32:17.042000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9511525Z ('RERUN', {'yellow': True}) [1.7874s] [100%] 2025-12-04T10:35:20.9512492Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9513131Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9513604Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9514077Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9514496Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9514964Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9515471Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9515914Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9516347Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9516842Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9517265Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9517730Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9518247Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9518551Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9520094Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9520607Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9521341Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9521764Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9522480Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9523077Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9523807Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9524232Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9524945Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9525498Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9526279Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9527029Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9527745Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9528347Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9529064Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9529689Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9530444Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9530781Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9531364Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9531719Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9532167Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9533064Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9533601Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9534354Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9534929Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9535679Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9536336Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9536867Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9537507Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9537966Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9538442Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9538861Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9539312Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9539856Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9540328Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9540707Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9541496Z E1204 10:32:17.409000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9541625Z ('RERUN', {'yellow': True}) [0.3343s] [100%] 2025-12-04T10:35:20.9542655Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9543378Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9543877Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9544346Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9544777Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9545136Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9545637Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9546129Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9546552Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9546992Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9547415Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9547884Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9548343Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9548647Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9550188Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9550642Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9551418Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9551849Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9552559Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9553158Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9553928Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9554358Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9555108Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9555718Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9556453Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9557154Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9557867Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9558470Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9559186Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9559770Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9560530Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9560830Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9561408Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9561711Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9562158Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9563048Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9563626Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9564382Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9564960Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9565753Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9566406Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9566972Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9567730Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9568393Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9568968Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9569383Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9569749Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9570249Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9570687Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9571038Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9571731Z E1204 10:32:17.740000 92962 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9571822Z FAILED [0.3293s] [100%] 2025-12-04T10:35:20.9571827Z 2025-12-04T10:35:20.9571945Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.9572233Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9572339Z Traceback (most recent call last): 2025-12-04T10:35:20.9572654Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9572763Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9573174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9573387Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9573826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9573988Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9574420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9574601Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9575052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9575331Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9575771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9575897Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9576308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9576450Z return self._compile_to_module() 2025-12-04T10:35:20.9576864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9576999Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9577439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9577549Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9578009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9578203Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9578749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9578851Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9579357Z File "/tmp/tmph16ogg6w/qk/cqkrfvgghjdw2oxnsqjczq6z4hpx5fcyqydtg3tsmk4ty2b7phyh.py", line 51, in 2025-12-04T10:35:20.9579749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9579836Z kernel.precompile( 2025-12-04T10:35:20.9580312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9580407Z self._precompile_worker() 2025-12-04T10:35:20.9580923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9581070Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9581577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9581840Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9582219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9582425Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9582806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9583087Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9583293Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9583556Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9583659Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9583775Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9583860Z xmask = xindex < xnumel 2025-12-04T10:35:20.9583936Z x0 = xindex 2025-12-04T10:35:20.9584075Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9584173Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9584250Z ^ 2025-12-04T10:35:20.9584580Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9584585Z 2025-12-04T10:35:20.9585242Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9585250Z 2025-12-04T10:35:20.9585254Z 2025-12-04T10:35:20.9585437Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9586127Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9586135Z 2025-12-04T10:35:20.9586365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9586592Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9586676Z frames [('total', 1)] 2025-12-04T10:35:20.9586772Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9587179Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9587367Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9587444Z graph_break [] 2025-12-04T10:35:20.9587768Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9587876Z Traceback (most recent call last): 2025-12-04T10:35:20.9588191Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9588333Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9588754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9588964Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9589406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9589567Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9589997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9590123Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9590579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9590858Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9591296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9591419Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9591836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9591934Z return self._compile_to_module() 2025-12-04T10:35:20.9592350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9592491Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9592927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9593045Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9593470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9593661Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9594162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9594264Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9594757Z File "/tmp/tmpxjl3qbd0/pz/cpzjikpmr2b667q33ht6zd63esxhsqxbwgwz32qrsuxi6ypmlhuo.py", line 51, in 2025-12-04T10:35:20.9595151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9595240Z kernel.precompile( 2025-12-04T10:35:20.9595720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9595818Z self._precompile_worker() 2025-12-04T10:35:20.9596323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9596518Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9597024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9597192Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9597571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9597772Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9598226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9598507Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9598748Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9599014Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9599118Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9599234Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9599323Z xmask = xindex < xnumel 2025-12-04T10:35:20.9599396Z x0 = xindex 2025-12-04T10:35:20.9599540Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9599637Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9599708Z ^ 2025-12-04T10:35:20.9600044Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9600051Z 2025-12-04T10:35:20.9600656Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9600663Z 2025-12-04T10:35:20.9600667Z 2025-12-04T10:35:20.9600859Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9601550Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9601557Z 2025-12-04T10:35:20.9601791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9601971Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9602053Z frames [('total', 1)] 2025-12-04T10:35:20.9602153Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9602551Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9602734Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9602821Z graph_break [] 2025-12-04T10:35:20.9602996Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9603084Z frames [('total', 1)] 2025-12-04T10:35:20.9603180Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9603362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9603760Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9603839Z graph_break [] 2025-12-04T10:35:20.9604002Z =================================== FAILURES =================================== 2025-12-04T10:35:20.9604294Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9604399Z Traceback (most recent call last): 2025-12-04T10:35:20.9604714Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9604817Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9605231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9605444Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9605964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9606131Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9606568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9606686Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9607185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9607456Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9608094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9608222Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9608636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9608741Z return self._compile_to_module() 2025-12-04T10:35:20.9609151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9609285Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9609728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9609835Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9610258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9610468Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9610970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9611083Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9611515Z File "/tmp/tmpvuuppred/c5/cc5cgexulgv3lppnv2u6q5gbbeex5lylepyk2nin73lxcc6xd22t.py", line 51, in 2025-12-04T10:35:20.9611913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9612013Z kernel.precompile( 2025-12-04T10:35:20.9612490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9612594Z self._precompile_worker() 2025-12-04T10:35:20.9613107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9613261Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9613773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9613938Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9614314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9614596Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9614967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9615250Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9615445Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9615713Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9615816Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9615934Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9616130Z xmask = xindex < xnumel 2025-12-04T10:35:20.9616223Z x0 = xindex 2025-12-04T10:35:20.9616361Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9616455Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9616527Z ^ 2025-12-04T10:35:20.9616855Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9616859Z 2025-12-04T10:35:20.9617522Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9617527Z 2025-12-04T10:35:20.9617531Z 2025-12-04T10:35:20.9617712Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9618469Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9618476Z 2025-12-04T10:35:20.9618697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9618872Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9618959Z frames [('total', 1)] 2025-12-04T10:35:20.9619099Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9619499Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9619685Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9619760Z graph_break [] 2025-12-04T10:35:20.9619939Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9620022Z frames [('total', 1)] 2025-12-04T10:35:20.9620110Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9620295Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9620688Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9620766Z graph_break [] 2025-12-04T10:35:20.9620940Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9621017Z frames [('total', 1)] 2025-12-04T10:35:20.9621113Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9621296Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9621684Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9621762Z graph_break [] 2025-12-04T10:35:20.9622320Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.xml - 2025-12-04T10:35:20.9622468Z =========================== short test summary info ============================ 2025-12-04T10:35:20.9623154Z FAILED [0.3293s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9623426Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9623527Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9623686Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9623776Z xmask = xindex < xnumel 2025-12-04T10:35:20.9623853Z x0 = xindex 2025-12-04T10:35:20.9623990Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9624086Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9624157Z ^ 2025-12-04T10:35:20.9624480Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9624488Z 2025-12-04T10:35:20.9625091Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9625139Z 2025-12-04T10:35:20.9625142Z 2025-12-04T10:35:20.9625321Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9626075Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9626080Z 2025-12-04T10:35:20.9626344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9626494Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.9626660Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ================== 2025-12-04T10:35:20.9626779Z Got exit code 1 2025-12-04T10:35:20.9626866Z Retrying single test... 2025-12-04T10:35:20.9627263Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.xml 2025-12-04T10:35:20.9627398Z ============================= test session starts ============================== 2025-12-04T10:35:20.9627690Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.9627777Z cachedir: .pytest_cache 2025-12-04T10:35:20.9628222Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.9628327Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.9628414Z configfile: pytest.ini 2025-12-04T10:35:20.9628882Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.9629068Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.9629684Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9629778Z Running 1 items in this shard 2025-12-04T10:35:20.9629782Z 2025-12-04T10:35:20.9630743Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9631390Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9631846Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9632318Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9632740Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9633096Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9633639Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9634080Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9634513Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9634936Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9635357Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9635884Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9636342Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9636643Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9638216Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9638797Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9639528Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9639951Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9640654Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9641256Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9641978Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9642401Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9643117Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9643650Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9644382Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9645076Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9645825Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9646419Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9647128Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9647713Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9648505Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9648803Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9649415Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9649711Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9650211Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9651095Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9651629Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9652382Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9652951Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9653696Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9654348Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9654868Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9655506Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9655959Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9656434Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9656847Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9657217Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9657768Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9658214Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9658557Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9659350Z E1204 10:32:27.588000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9659506Z ('RERUN', {'yellow': True}) [1.7700s] [100%] 2025-12-04T10:35:20.9660465Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9661107Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9661606Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9662075Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9662531Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9662890Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9663394Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9663834Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9664257Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9664689Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9665109Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9665569Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9666024Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9666326Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9667858Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9668316Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9669038Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9669506Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9670216Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9670809Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9671530Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9671994Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9672704Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9673288Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9674017Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9674755Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9675465Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9676104Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9676814Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9677393Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9678147Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9678443Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9679020Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9679318Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9679766Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9680647Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9681176Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9682070Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9682646Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9683396Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9684095Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9684617Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9685293Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9685747Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9686273Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9686723Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9687083Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9687579Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9688022Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9688364Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9689055Z E1204 10:32:27.950000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9689166Z ('RERUN', {'yellow': True}) [0.3301s] [100%] 2025-12-04T10:35:20.9690126Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9690762Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9691218Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9691683Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9692107Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9692462Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9692957Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9693439Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9693863Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9694294Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9694714Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9695175Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9695675Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9695974Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9697551Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*bf16', 'out_ptr0': '*bf16', 'out_ptr1': '*bf16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9698038Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9698765Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9699235Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9699942Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9700535Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9701258Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9701681Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9702396Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9702936Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9703668Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9704362Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9705079Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9705720Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9706432Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9707017Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9707946Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9708245Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9708823Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9709184Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9709632Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9710567Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9711097Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9711854Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9712424Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9713167Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9713818Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9714340Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9714976Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9715431Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9715952Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9716365Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9716730Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9717225Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9717728Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9718080Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9718772Z E1204 10:32:28.280000 93143 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9718862Z FAILED [0.3281s] [100%] 2025-12-04T10:35:20.9718867Z 2025-12-04T10:35:20.9718983Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.9719327Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9719427Z Traceback (most recent call last): 2025-12-04T10:35:20.9719734Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9719839Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9720254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9720503Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9721007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9721324Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9721888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9722049Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9722553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9722830Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9723276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9723404Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9723808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9723906Z return self._compile_to_module() 2025-12-04T10:35:20.9724317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9724449Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9724885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9724991Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9725408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9725603Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9726152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9726256Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9726694Z File "/tmp/tmpxotdeeut/wa/cwas4k5bmikkdvpmygvybq3wo6qu6hftgs6fwgnnnlpq7rkgjhxv.py", line 51, in 2025-12-04T10:35:20.9727084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9727181Z kernel.precompile( 2025-12-04T10:35:20.9727651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9727744Z self._precompile_worker() 2025-12-04T10:35:20.9728312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9728459Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9728963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9729129Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9729510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9729715Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9730132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9730416Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9730611Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9730883Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9730986Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9731137Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9731223Z xmask = xindex < xnumel 2025-12-04T10:35:20.9731303Z x0 = xindex 2025-12-04T10:35:20.9731438Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9731578Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9731654Z ^ 2025-12-04T10:35:20.9731978Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9731986Z 2025-12-04T10:35:20.9732593Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9732603Z 2025-12-04T10:35:20.9732606Z 2025-12-04T10:35:20.9732787Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9733479Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9733484Z 2025-12-04T10:35:20.9733711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9733889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9733982Z frames [('total', 1)] 2025-12-04T10:35:20.9734075Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9734475Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9734666Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9734748Z graph_break [] 2025-12-04T10:35:20.9735033Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9735134Z Traceback (most recent call last): 2025-12-04T10:35:20.9735442Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9735547Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9735959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9736167Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9736611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9736769Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9737197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9737315Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9737813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9738095Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9738532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9738654Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9739132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9739276Z return self._compile_to_module() 2025-12-04T10:35:20.9739687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9739820Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9740255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9740364Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9740839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9741031Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9741531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9741675Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9742112Z File "/tmp/tmphrmls84p/qy/cqyxrggtsu3ukvo3bajxykjun7e27sru4bfxaaako652gqnsbq3k.py", line 51, in 2025-12-04T10:35:20.9742502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9742590Z kernel.precompile( 2025-12-04T10:35:20.9743061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9743157Z self._precompile_worker() 2025-12-04T10:35:20.9743663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9743807Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9744311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9744476Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9744853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9745057Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9745431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9745710Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9745911Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9746176Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9746275Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9746390Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9746475Z xmask = xindex < xnumel 2025-12-04T10:35:20.9746548Z x0 = xindex 2025-12-04T10:35:20.9746692Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9746786Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9746856Z ^ 2025-12-04T10:35:20.9747181Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9747186Z 2025-12-04T10:35:20.9747836Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9747841Z 2025-12-04T10:35:20.9747850Z 2025-12-04T10:35:20.9748034Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9748725Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9748733Z 2025-12-04T10:35:20.9748956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9749176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9749260Z frames [('total', 1)] 2025-12-04T10:35:20.9749355Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9749754Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9749948Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9750024Z graph_break [] 2025-12-04T10:35:20.9750237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9750321Z frames [('total', 1)] 2025-12-04T10:35:20.9750411Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9750588Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9751021Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9751103Z graph_break [] 2025-12-04T10:35:20.9751220Z =================================== FAILURES =================================== 2025-12-04T10:35:20.9751501Z _ TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 _ 2025-12-04T10:35:20.9751599Z Traceback (most recent call last): 2025-12-04T10:35:20.9751913Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9752013Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9752422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9752631Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9753061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9753226Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9753655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9753773Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9754224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9754501Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9754942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9755060Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9755467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9755570Z return self._compile_to_module() 2025-12-04T10:35:20.9759718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9759882Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9760334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9760441Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9760935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9761134Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9761635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9761748Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9762184Z File "/tmp/tmppq3rutt5/x3/cx3pnwl6zyhlokbor4oxc2kmiejg3so3sdw4yry6dg4h76h56h5p.py", line 51, in 2025-12-04T10:35:20.9762634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9762726Z kernel.precompile( 2025-12-04T10:35:20.9763196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9763300Z self._precompile_worker() 2025-12-04T10:35:20.9763809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9764028Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9764537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9764745Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9765130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9765336Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9765705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9765992Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9766192Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9766465Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9766563Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9766676Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9766771Z xmask = xindex < xnumel 2025-12-04T10:35:20.9766847Z x0 = xindex 2025-12-04T10:35:20.9766980Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9767082Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9767154Z ^ 2025-12-04T10:35:20.9767481Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9767486Z 2025-12-04T10:35:20.9768096Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9768104Z 2025-12-04T10:35:20.9768108Z 2025-12-04T10:35:20.9768292Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9769001Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9769006Z 2025-12-04T10:35:20.9769231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9769413Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9769495Z frames [('total', 1)] 2025-12-04T10:35:20.9769588Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9769994Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9770183Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9770308Z graph_break [] 2025-12-04T10:35:20.9770490Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9770573Z frames [('total', 1)] 2025-12-04T10:35:20.9770675Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9770857Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9771250Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9771334Z graph_break [] 2025-12-04T10:35:20.9771511Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9771638Z frames [('total', 1)] 2025-12-04T10:35:20.9771736Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9771915Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9772315Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9772405Z graph_break [] 2025-12-04T10:35:20.9772961Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.xml - 2025-12-04T10:35:20.9773153Z =========================== short test summary info ============================ 2025-12-04T10:35:20.9773842Z FAILED [0.3281s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9774148Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9774254Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9774367Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9774457Z xmask = xindex < xnumel 2025-12-04T10:35:20.9774533Z x0 = xindex 2025-12-04T10:35:20.9774670Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9774774Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9774845Z ^ 2025-12-04T10:35:20.9775169Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9775177Z 2025-12-04T10:35:20.9775783Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9775791Z 2025-12-04T10:35:20.9775794Z 2025-12-04T10:35:20.9775972Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9776668Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9776675Z 2025-12-04T10:35:20.9776897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9777054Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.9777218Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ================== 2025-12-04T10:35:20.9777295Z Got exit code 1 2025-12-04T10:35:20.9777784Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 2025-12-04T10:35:20.9778134Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:20.9778535Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.xml 2025-12-04T10:35:20.9778677Z ============================= test session starts ============================== 2025-12-04T10:35:20.9778965Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.9779120Z cachedir: .pytest_cache 2025-12-04T10:35:20.9779616Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.9779716Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.9779807Z configfile: pytest.ini 2025-12-04T10:35:20.9780273Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.9780471Z collecting ... collected 188 items / 61 deselected / 127 selected 2025-12-04T10:35:20.9780590Z stepcurrent: skipping 61 already run items. 2025-12-04T10:35:20.9780680Z Running 127 items in this shard 2025-12-04T10:35:20.9780685Z 2025-12-04T10:35:20.9781622Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9782309Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9782809Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9783283Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9783735Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9784103Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9784610Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9785055Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9785487Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9785915Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9786342Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9786802Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9787267Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9787567Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9789115Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9789571Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9790301Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9790775Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9791479Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9792084Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9792803Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9793279Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9793989Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9794566Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9795304Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9796081Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9796809Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9797402Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9798120Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9798697Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9799452Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9799762Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9800340Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9800646Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9801092Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9801978Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9802510Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9803300Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9803882Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9804624Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9805291Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9805877Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9806527Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9807024Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9807495Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9808129Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9808495Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9809010Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9809452Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9809798Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9810494Z E1204 10:32:38.216000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9810601Z ('RERUN', {'yellow': True}) [1.7889s] [ 0%] 2025-12-04T10:35:20.9811532Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9812169Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9812639Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9813110Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9813524Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9813888Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9814389Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9814841Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9815343Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9815797Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9816250Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9816712Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9817179Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9817532Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9819249Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9819759Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9820483Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9820917Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9821620Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9822222Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9822940Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9823381Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9824090Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9824630Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9825371Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9826062Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9826778Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9827406Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9828126Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9828701Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9829449Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9829794Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9830370Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9830668Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9831155Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9832046Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9832614Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9833363Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9833947Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9834691Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9835350Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9835889Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9836567Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9837022Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9837497Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9837928Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9838290Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9838806Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9839246Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9839636Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9840350Z E1204 10:32:38.578000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9840463Z ('RERUN', {'yellow': True}) [0.3298s] [ 0%] 2025-12-04T10:35:20.9841396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9842079Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9842548Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9843068Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9843480Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9843852Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9844391Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9844843Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9845268Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9845702Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9846162Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9846644Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9847109Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9847406Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9848945Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9849399Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9850124Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9850562Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9851330Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9851937Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9852653Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9853083Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9853834Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9854372Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9855149Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9855849Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9856767Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9857538Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9858284Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9858866Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9859667Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9859987Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9860565Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9860868Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9861315Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9862199Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9862730Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9863480Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9864121Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9864873Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9865528Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9866048Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9866738Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9867195Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9867703Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9868123Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9868523Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9869027Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9869469Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9869813Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9870607Z E1204 10:32:38.907000 93324 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9870697Z FAILED [0.3271s] [ 0%] 2025-12-04T10:35:20.9870702Z 2025-12-04T10:35:20.9870826Z ==================================== RERUNS ==================================== 2025-12-04T10:35:20.9871094Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:20.9871199Z Traceback (most recent call last): 2025-12-04T10:35:20.9871517Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9871625Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9872043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9872255Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9872694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9872859Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9873291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9873415Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9873874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9874150Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9874597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9874769Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9875179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9875287Z return self._compile_to_module() 2025-12-04T10:35:20.9875699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9875837Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9876278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9876428Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9876847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9877043Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9877540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9877649Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9878121Z File "/tmp/tmp6bdnmq07/i2/ci24zwedewtteulurulj2yqpzur36uxl7jsfe23vzvkjvdvmqmz5.py", line 51, in 2025-12-04T10:35:20.9878521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9878650Z kernel.precompile( 2025-12-04T10:35:20.9879119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9879225Z self._precompile_worker() 2025-12-04T10:35:20.9879735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9879892Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9880398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9880565Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9880958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9881161Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9881540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9881833Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9882035Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9882307Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9882410Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9882529Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9882627Z xmask = xindex < xnumel 2025-12-04T10:35:20.9882704Z x0 = xindex 2025-12-04T10:35:20.9882847Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9882948Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9883020Z ^ 2025-12-04T10:35:20.9883357Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9883365Z 2025-12-04T10:35:20.9883973Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9883979Z 2025-12-04T10:35:20.9883983Z 2025-12-04T10:35:20.9884167Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9884894Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:20.9884900Z 2025-12-04T10:35:20.9885128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9885322Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9885403Z frames [('total', 1)] 2025-12-04T10:35:20.9885498Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9885903Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9886098Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9886228Z graph_break [] 2025-12-04T10:35:20.9886492Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:20.9886594Z Traceback (most recent call last): 2025-12-04T10:35:20.9886911Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9887020Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9887434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9887688Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9888128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9888364Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9888797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9888923Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9889377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9889653Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9890105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9890227Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9890630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9890743Z return self._compile_to_module() 2025-12-04T10:35:20.9891152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9891289Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9891732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9891836Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9892266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9892460Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9892958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9893071Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9893503Z File "/tmp/tmpfwefhtvo/le/cleqsmrvkhej5ymfpal7rs462idmf4ikyw24x6hg226j3bk5u7iz.py", line 51, in 2025-12-04T10:35:20.9893907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9894001Z kernel.precompile( 2025-12-04T10:35:20.9894470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9894571Z self._precompile_worker() 2025-12-04T10:35:20.9895122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9895272Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9895783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9895947Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9896334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9896539Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9896953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9897242Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9897440Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9897712Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9897814Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9897966Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9898065Z xmask = xindex < xnumel 2025-12-04T10:35:20.9898141Z x0 = xindex 2025-12-04T10:35:20.9898277Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9898420Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9898494Z ^ 2025-12-04T10:35:20.9898823Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9898830Z 2025-12-04T10:35:20.9899489Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9899493Z 2025-12-04T10:35:20.9899497Z 2025-12-04T10:35:20.9899679Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9900360Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:20.9900365Z 2025-12-04T10:35:20.9900587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9900777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9900860Z frames [('total', 1)] 2025-12-04T10:35:20.9900955Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9901366Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9901555Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9901632Z graph_break [] 2025-12-04T10:35:20.9901815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9901902Z frames [('total', 1)] 2025-12-04T10:35:20.9902001Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9902186Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9902581Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9902668Z graph_break [] 2025-12-04T10:35:20.9902786Z =================================== FAILURES =================================== 2025-12-04T10:35:20.9903044Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:20.9903156Z Traceback (most recent call last): 2025-12-04T10:35:20.9903469Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:20.9903580Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:20.9904041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:20.9904253Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:20.9904689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:20.9904847Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:20.9905279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:20.9905410Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:20.9905901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:20.9906175Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:20.9906621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:20.9906740Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:20.9907193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:20.9907289Z return self._compile_to_module() 2025-12-04T10:35:20.9907699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:20.9908039Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:20.9908473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:20.9908589Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:20.9909004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:20.9909194Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:20.9909701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:20.9909807Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:20.9910239Z File "/tmp/tmpggefi57i/ae/caea5evwbb6enzzbyc6agqavzrfi3hoa7mvo62gulno5ykg47bu6.py", line 51, in 2025-12-04T10:35:20.9910629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:20.9910718Z kernel.precompile( 2025-12-04T10:35:20.9911187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:20.9911283Z self._precompile_worker() 2025-12-04T10:35:20.9911790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:20.9911939Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:20.9912441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9912607Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9912981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9913197Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9913564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9913842Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9914036Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9914300Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9914477Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9914593Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9914677Z xmask = xindex < xnumel 2025-12-04T10:35:20.9914756Z x0 = xindex 2025-12-04T10:35:20.9914894Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9914990Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9915067Z ^ 2025-12-04T10:35:20.9915395Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9915400Z 2025-12-04T10:35:20.9916003Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9916071Z 2025-12-04T10:35:20.9916078Z 2025-12-04T10:35:20.9916257Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9916931Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:20.9916936Z 2025-12-04T10:35:20.9917212Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9917389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9917473Z frames [('total', 1)] 2025-12-04T10:35:20.9917624Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9918019Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9918208Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9918286Z graph_break [] 2025-12-04T10:35:20.9918459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9918545Z frames [('total', 1)] 2025-12-04T10:35:20.9918636Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9918824Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9919221Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9919298Z graph_break [] 2025-12-04T10:35:20.9919474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:20.9919557Z frames [('total', 1)] 2025-12-04T10:35:20.9919649Z stats [('calls_captured', 4)] 2025-12-04T10:35:20.9919839Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:20.9920227Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:20.9920304Z graph_break [] 2025-12-04T10:35:20.9920859Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.xml - 2025-12-04T10:35:20.9920999Z =========================== short test summary info ============================ 2025-12-04T10:35:20.9921656Z FAILED [0.3271s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:20.9921921Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9922022Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9922139Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9922225Z xmask = xindex < xnumel 2025-12-04T10:35:20.9922297Z x0 = xindex 2025-12-04T10:35:20.9922434Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9922527Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9922604Z ^ 2025-12-04T10:35:20.9922925Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9922930Z 2025-12-04T10:35:20.9923581Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:20.9923586Z 2025-12-04T10:35:20.9923592Z 2025-12-04T10:35:20.9923771Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:20.9924436Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:20.9924443Z 2025-12-04T10:35:20.9924668Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:20.9924854Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:20.9925018Z ================== 1 failed, 61 deselected, 2 rerun in 2.48s =================== 2025-12-04T10:35:20.9925101Z Got exit code 1 2025-12-04T10:35:20.9925184Z Retrying single test... 2025-12-04T10:35:20.9925593Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.xml 2025-12-04T10:35:20.9925798Z ============================= test session starts ============================== 2025-12-04T10:35:20.9926089Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:20.9926180Z cachedir: .pytest_cache 2025-12-04T10:35:20.9926661Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:20.9926761Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:20.9926853Z configfile: pytest.ini 2025-12-04T10:35:20.9927308Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:20.9927494Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:20.9928092Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:20.9928184Z Running 1 items in this shard 2025-12-04T10:35:20.9928188Z 2025-12-04T10:35:20.9929122Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9929766Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9930229Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9930700Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9931110Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9931472Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9931971Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9932421Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9932845Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9933271Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9933740Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9934204Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9934661Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9934958Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9936550Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9937080Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9937812Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9938275Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9938979Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9939645Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9940363Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9940787Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9941497Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9942040Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9942772Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9943466Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9944186Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9944775Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9945487Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9946161Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9946924Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9947222Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9947792Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9948656Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9949103Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9950029Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9950562Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9951352Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9952009Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9952759Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9953421Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9953940Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9954584Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9955043Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9955520Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9955985Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9956344Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9956843Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9957285Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9957637Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9958376Z E1204 10:32:48.760000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9958485Z ('RERUN', {'yellow': True}) [1.7678s] [100%] 2025-12-04T10:35:20.9959417Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9960058Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9960561Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9961033Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9961448Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9961851Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9962351Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9962831Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9963253Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9963685Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9964109Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9964566Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9965030Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9965324Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9966922Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9967376Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9968108Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9968530Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9969235Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9969834Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:20.9970598Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9971025Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9971735Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:20.9972343Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:20.9973076Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:20.9973803Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:20.9974515Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:20.9975143Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:20.9975898Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:20.9976490Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:20.9977245Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9977541Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9978120Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:20.9978429Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9978875Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9979813Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:20.9980347Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:20.9981101Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:20.9981676Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:20.9982465Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:20.9983120Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:20.9983643Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:20.9984282Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9984781Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9985254Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9985676Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9986070Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9986573Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9987050Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9987405Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:20.9988100Z E1204 10:32:49.123000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:20.9988212Z ('RERUN', {'yellow': True}) [0.3298s] [100%] 2025-12-04T10:35:20.9989152Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:20.9989789Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:20.9990246Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:20.9990722Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:20.9991137Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:20.9991499Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:20.9991995Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:20.9992435Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:20.9992860Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:20.9993293Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:20.9993714Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:20.9994217Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:20.9994680Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:20.9994976Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:20.9996569Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:20.9997146Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:20.9997908Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:20.9998332Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:20.9999070Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:20.9999674Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0000393Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0000821Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0001529Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0002073Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0002806Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0003498Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0004214Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0004802Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0005514Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0006133Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0006888Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0007184Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0007927Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0008233Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0008779Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0009730Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0010354Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0011164Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0011796Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0012540Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0013197Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0013713Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0014352Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0014807Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0015284Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0015698Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0016055Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0016557Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0016993Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0017343Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0018036Z E1204 10:32:49.454000 93505 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0018119Z FAILED [0.3293s] [100%] 2025-12-04T10:35:21.0018124Z 2025-12-04T10:35:21.0018329Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.0018596Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0018701Z Traceback (most recent call last): 2025-12-04T10:35:21.0019011Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0019159Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0019606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0019827Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0020308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0020472Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0020904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0021031Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0021520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0021794Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0022330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0022502Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0023057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0023192Z return self._compile_to_module() 2025-12-04T10:35:21.0023721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0023871Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0024312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0024417Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0024836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0025034Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0025538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0025642Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0026076Z File "/tmp/tmpwp7kngc6/ap/capzrkg6dqv6xdacrwaqz3rrd7odavimxjzojflop2yh27s4yo2c.py", line 51, in 2025-12-04T10:35:21.0026475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0026562Z kernel.precompile( 2025-12-04T10:35:21.0027038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0027132Z self._precompile_worker() 2025-12-04T10:35:21.0027635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0027786Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0028293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0028457Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0028843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0029108Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0029485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0029766Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0029958Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0030231Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0030328Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0030486Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0030575Z xmask = xindex < xnumel 2025-12-04T10:35:21.0030648Z x0 = xindex 2025-12-04T10:35:21.0030787Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0030881Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0030948Z ^ 2025-12-04T10:35:21.0031278Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0031283Z 2025-12-04T10:35:21.0031929Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0031935Z 2025-12-04T10:35:21.0031939Z 2025-12-04T10:35:21.0032122Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0032838Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0032845Z 2025-12-04T10:35:21.0033066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0033246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0033330Z frames [('total', 1)] 2025-12-04T10:35:21.0033428Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0033827Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0034011Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0034089Z graph_break [] 2025-12-04T10:35:21.0034349Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0034449Z Traceback (most recent call last): 2025-12-04T10:35:21.0034765Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0034865Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0035283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0035488Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0035976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0036138Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0036569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0036685Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0037141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0037411Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0037854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0037973Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0038424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0038527Z return self._compile_to_module() 2025-12-04T10:35:21.0038937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0039070Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0039507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0039613Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0040032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0040265Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0040759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0040867Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0041269Z File "/tmp/tmp_pbmoxbe/a3/ca3u2ajjun42444g6dvyz6egrpl3erlmmvy5h745rrbezfqzfbrp.py", line 51, in 2025-12-04T10:35:21.0041703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0041788Z kernel.precompile( 2025-12-04T10:35:21.0042256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0042392Z self._precompile_worker() 2025-12-04T10:35:21.0042894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0043042Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0043542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0043706Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0044085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0044290Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0044661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0044948Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0045140Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0045411Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0045509Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0045619Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0045709Z xmask = xindex < xnumel 2025-12-04T10:35:21.0045779Z x0 = xindex 2025-12-04T10:35:21.0045915Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0046011Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0046084Z ^ 2025-12-04T10:35:21.0046411Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0046416Z 2025-12-04T10:35:21.0047024Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0047032Z 2025-12-04T10:35:21.0047036Z 2025-12-04T10:35:21.0047216Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0047887Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0047892Z 2025-12-04T10:35:21.0051990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0052204Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0052291Z frames [('total', 1)] 2025-12-04T10:35:21.0052392Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0052800Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0052991Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0053080Z graph_break [] 2025-12-04T10:35:21.0053257Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0053388Z frames [('total', 1)] 2025-12-04T10:35:21.0053487Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0053667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0054064Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0054151Z graph_break [] 2025-12-04T10:35:21.0054269Z =================================== FAILURES =================================== 2025-12-04T10:35:21.0054583Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0054684Z Traceback (most recent call last): 2025-12-04T10:35:21.0055003Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0055184Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0055599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0055808Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0056249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0056411Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0056851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0056973Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0057425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0057704Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0058147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0058276Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0058681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0058782Z return self._compile_to_module() 2025-12-04T10:35:21.0059269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0059407Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0059846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0059962Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0060381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0060582Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0061079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0061181Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0061625Z File "/tmp/tmpsmypz3ex/ia/ciawyuxkqmtttxywd36rbim2duyljfnbmfmvp2sqsqc2jltyyr3q.py", line 51, in 2025-12-04T10:35:21.0062065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0062168Z kernel.precompile( 2025-12-04T10:35:21.0062641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0062737Z self._precompile_worker() 2025-12-04T10:35:21.0063249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0063394Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0063945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0064113Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0064492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0064702Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0065112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0065394Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0065650Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0065954Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0066059Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0066172Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0066259Z xmask = xindex < xnumel 2025-12-04T10:35:21.0066340Z x0 = xindex 2025-12-04T10:35:21.0066477Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0066575Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0066659Z ^ 2025-12-04T10:35:21.0066987Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0066992Z 2025-12-04T10:35:21.0067600Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0067614Z 2025-12-04T10:35:21.0067618Z 2025-12-04T10:35:21.0067796Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0068469Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0068477Z 2025-12-04T10:35:21.0068711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0068890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0068985Z frames [('total', 1)] 2025-12-04T10:35:21.0069084Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0069485Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0069678Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0069758Z graph_break [] 2025-12-04T10:35:21.0069935Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0070023Z frames [('total', 1)] 2025-12-04T10:35:21.0070116Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0070305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0070697Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0070777Z graph_break [] 2025-12-04T10:35:21.0071009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0071097Z frames [('total', 1)] 2025-12-04T10:35:21.0071189Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0071379Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0071767Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0071846Z graph_break [] 2025-12-04T10:35:21.0072408Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.xml - 2025-12-04T10:35:21.0072591Z =========================== short test summary info ============================ 2025-12-04T10:35:21.0073247Z FAILED [0.3293s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0073519Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0073621Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0073743Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0073877Z xmask = xindex < xnumel 2025-12-04T10:35:21.0073962Z x0 = xindex 2025-12-04T10:35:21.0074099Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0074194Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0074308Z ^ 2025-12-04T10:35:21.0074635Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0074640Z 2025-12-04T10:35:21.0075247Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0075258Z 2025-12-04T10:35:21.0075262Z 2025-12-04T10:35:21.0075441Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0076112Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0076116Z 2025-12-04T10:35:21.0076350Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0076498Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.0076672Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ================== 2025-12-04T10:35:21.0076754Z Got exit code 1 2025-12-04T10:35:21.0076845Z Retrying single test... 2025-12-04T10:35:21.0077255Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.xml 2025-12-04T10:35:21.0077386Z ============================= test session starts ============================== 2025-12-04T10:35:21.0077679Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.0077776Z cachedir: .pytest_cache 2025-12-04T10:35:21.0078222Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.0078328Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.0078414Z configfile: pytest.ini 2025-12-04T10:35:21.0078870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.0079068Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.0079665Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0079760Z Running 1 items in this shard 2025-12-04T10:35:21.0079765Z 2025-12-04T10:35:21.0080750Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0081399Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0081869Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0082344Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0082805Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0083167Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0083667Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0084155Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0084579Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0085056Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0085480Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0085947Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0086411Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0086708Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0088254Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0088713Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0089453Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0089879Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0090592Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0091195Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0091916Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0092389Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0093105Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0093645Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0094380Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0095116Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0095831Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0096483Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0097246Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0097825Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0098586Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0098884Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0099548Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0099846Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0100298Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0101188Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0101720Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0102484Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0103053Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0103798Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0104452Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0105016Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0105668Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0106124Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0106611Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0107067Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0107429Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0108205Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0108728Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0109077Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0109827Z E1204 10:32:59.329000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0109937Z ('RERUN', {'yellow': True}) [1.7682s] [100%] 2025-12-04T10:35:21.0110875Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0111512Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0111972Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0112442Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0112857Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0113218Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0113718Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0114165Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0114590Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0115022Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0115446Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0115905Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0116375Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0116733Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0118276Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0118786Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0119527Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0119957Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0120696Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0121339Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0122058Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0122489Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0123202Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0123745Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0124481Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0125179Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0125895Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0126484Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0127200Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0127779Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0128535Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0128878Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0129458Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0129755Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0130207Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0131096Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0131670Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0132466Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0133039Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0133895Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0134755Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0135374Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0136020Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0136477Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0136954Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0137370Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0137731Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0138242Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0138684Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0139087Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0139787Z E1204 10:32:59.691000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0139900Z ('RERUN', {'yellow': True}) [0.3289s] [100%] 2025-12-04T10:35:21.0140835Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0141537Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0142003Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0142473Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0142894Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0143321Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0143816Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0144266Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0144728Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0145164Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0145627Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0146087Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0146551Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0146851Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0148386Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0148841Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0149680Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0150113Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0150825Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0151428Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0152150Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0152581Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0153337Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0153880Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0154610Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0155307Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0156069Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0156695Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0157414Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0158035Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0158794Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0159092Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0159671Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0159973Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0160423Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0161320Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0161859Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0162618Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0163190Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0163932Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0164590Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0165108Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0165799Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0166258Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0166731Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0167149Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0167548Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0168048Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0168487Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0168888Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0169589Z E1204 10:33:00.023000 93686 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0169710Z FAILED [0.3302s] [100%] 2025-12-04T10:35:21.0169715Z 2025-12-04T10:35:21.0169845Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.0170111Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0170223Z Traceback (most recent call last): 2025-12-04T10:35:21.0170533Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0170639Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0171068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0171282Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0171726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0171892Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0172326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0172455Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0172909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0173183Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0173634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0173760Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0174175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0174274Z return self._compile_to_module() 2025-12-04T10:35:21.0174683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0174831Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0175271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0175383Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0175969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0176166Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0176674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0176777Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0177188Z File "/tmp/tmpv1kk58_t/23/c23rvff6ei43cri4cmsllnhtvyo3jgw6uba26koyxxnkvhj5fise.py", line 51, in 2025-12-04T10:35:21.0177591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0177732Z kernel.precompile( 2025-12-04T10:35:21.0178208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0178309Z self._precompile_worker() 2025-12-04T10:35:21.0178818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0178968Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0179566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0179739Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0180159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0180364Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0180746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0181027Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0181224Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0181500Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0181601Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0181718Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0181805Z xmask = xindex < xnumel 2025-12-04T10:35:21.0181876Z x0 = xindex 2025-12-04T10:35:21.0182021Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0182117Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0182185Z ^ 2025-12-04T10:35:21.0182521Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0182528Z 2025-12-04T10:35:21.0183136Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0183141Z 2025-12-04T10:35:21.0183145Z 2025-12-04T10:35:21.0183330Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0184004Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0184009Z 2025-12-04T10:35:21.0184235Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0184415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0184499Z frames [('total', 1)] 2025-12-04T10:35:21.0184594Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0184994Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0185178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0185262Z graph_break [] 2025-12-04T10:35:21.0185603Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0185714Z Traceback (most recent call last): 2025-12-04T10:35:21.0186076Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0186180Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0186594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0186805Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0187242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0187454Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0187891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0188017Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0188472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0188789Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0189235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0189395Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0189806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0189906Z return self._compile_to_module() 2025-12-04T10:35:21.0190313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0190452Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0190890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0190993Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0191419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0191615Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0192120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0192223Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0192634Z File "/tmp/tmp3oz_wrbl/37/c373tt76ok5bcbnefwvhgadbdhogznnoubl3wkrtxrqgapg67i35.py", line 51, in 2025-12-04T10:35:21.0193040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0193130Z kernel.precompile( 2025-12-04T10:35:21.0193618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0193712Z self._precompile_worker() 2025-12-04T10:35:21.0194220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0194371Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0194877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0195049Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0195436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0195640Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0196077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0196359Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0196560Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0196833Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0196932Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0197047Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0197135Z xmask = xindex < xnumel 2025-12-04T10:35:21.0197205Z x0 = xindex 2025-12-04T10:35:21.0197396Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0197494Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0197563Z ^ 2025-12-04T10:35:21.0197892Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0197897Z 2025-12-04T10:35:21.0198503Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0198507Z 2025-12-04T10:35:21.0198551Z 2025-12-04T10:35:21.0198735Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0199406Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0199450Z 2025-12-04T10:35:21.0199672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0199857Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0199936Z frames [('total', 1)] 2025-12-04T10:35:21.0200030Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0200437Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0200620Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0200701Z graph_break [] 2025-12-04T10:35:21.0200882Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0200963Z frames [('total', 1)] 2025-12-04T10:35:21.0201056Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0201237Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0201633Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0201712Z graph_break [] 2025-12-04T10:35:21.0201830Z =================================== FAILURES =================================== 2025-12-04T10:35:21.0202095Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0202193Z Traceback (most recent call last): 2025-12-04T10:35:21.0202508Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0202618Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0203029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0203245Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0203679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0203836Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0204269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0204386Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0204884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0205155Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0205595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0205722Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0206126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0206224Z return self._compile_to_module() 2025-12-04T10:35:21.0206639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0206813Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0207254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0207362Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0207955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0208228Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0208725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0208884Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0209320Z File "/tmp/tmpidncq7tf/az/cazpasr2x2aohewuzq3ri4zqtffqcm3ol65dvh54dnpnkvy7cske.py", line 51, in 2025-12-04T10:35:21.0209713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0209806Z kernel.precompile( 2025-12-04T10:35:21.0210275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0210369Z self._precompile_worker() 2025-12-04T10:35:21.0210876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0211026Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0211533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0211698Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0212074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0212279Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0212648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0212930Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0213128Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0213395Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0213495Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0213604Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0213687Z xmask = xindex < xnumel 2025-12-04T10:35:21.0213770Z x0 = xindex 2025-12-04T10:35:21.0213905Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0213996Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0214072Z ^ 2025-12-04T10:35:21.0214397Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0214402Z 2025-12-04T10:35:21.0215068Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0215073Z 2025-12-04T10:35:21.0215077Z 2025-12-04T10:35:21.0215255Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0215980Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0215992Z 2025-12-04T10:35:21.0216216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0216390Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0216528Z frames [('total', 1)] 2025-12-04T10:35:21.0216618Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0217013Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0217200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0217280Z graph_break [] 2025-12-04T10:35:21.0217458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0217541Z frames [('total', 1)] 2025-12-04T10:35:21.0217673Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0217858Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0218255Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0218400Z graph_break [] 2025-12-04T10:35:21.0218577Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0218662Z frames [('total', 1)] 2025-12-04T10:35:21.0218756Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0218938Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0219374Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0219461Z graph_break [] 2025-12-04T10:35:21.0220014Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.xml - 2025-12-04T10:35:21.0220152Z =========================== short test summary info ============================ 2025-12-04T10:35:21.0220806Z FAILED [0.3302s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0221077Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0221188Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0221296Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0221380Z xmask = xindex < xnumel 2025-12-04T10:35:21.0221454Z x0 = xindex 2025-12-04T10:35:21.0221591Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0221694Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0221769Z ^ 2025-12-04T10:35:21.0222098Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0222105Z 2025-12-04T10:35:21.0222713Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0222720Z 2025-12-04T10:35:21.0222723Z 2025-12-04T10:35:21.0222903Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0223575Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0223584Z 2025-12-04T10:35:21.0223805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0224000Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.0224173Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ================== 2025-12-04T10:35:21.0224250Z Got exit code 1 2025-12-04T10:35:21.0224709Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T10:35:21.0225073Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:21.0225469Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.xml 2025-12-04T10:35:21.0225654Z ============================= test session starts ============================== 2025-12-04T10:35:21.0225943Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.0226034Z cachedir: .pytest_cache 2025-12-04T10:35:21.0226481Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.0226580Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.0226667Z configfile: pytest.ini 2025-12-04T10:35:21.0227176Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.0227370Z collecting ... collected 188 items / 62 deselected / 126 selected 2025-12-04T10:35:21.0227531Z stepcurrent: skipping 62 already run items. 2025-12-04T10:35:21.0227624Z Running 126 items in this shard 2025-12-04T10:35:21.0227628Z 2025-12-04T10:35:21.0228581Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0229229Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0229691Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0230171Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0230585Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0230942Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0231452Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0231896Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0232322Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0232748Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0233168Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0233629Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0234088Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0234383Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0236052Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0236509Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0237277Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0237707Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0238447Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0239045Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0239822Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0240249Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0240966Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0241501Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0242235Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0242925Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0243633Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0244234Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0244948Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0245530Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0246278Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0246579Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0247194Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0247497Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0247950Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0248837Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0249418Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0250166Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0250780Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0251521Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0252213Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0252735Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0253372Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0253837Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0254310Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0254727Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0255088Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0255583Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0256075Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0256416Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0257124Z E1204 10:33:09.854000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0257232Z ('RERUN', {'yellow': True}) [1.7593s] [ 0%] 2025-12-04T10:35:21.0258184Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0258828Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0259375Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0259857Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0260271Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0260631Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0261177Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0261619Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0262048Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0262542Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0262966Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0263464Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0263933Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0264236Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0265783Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0266289Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0267016Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0267445Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0268152Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0268749Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0269476Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0269903Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0270625Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0271204Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0271945Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0272636Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0273389Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0273988Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0274737Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0275317Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0276106Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0276414Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0276990Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0277285Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0277737Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0278623Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0279157Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0279909Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0280489Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0281234Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0281887Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0282407Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0283085Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0283543Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0284013Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0284428Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0284787Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0285327Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0285774Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0286119Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0286858Z E1204 10:33:10.218000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0286966Z ('RERUN', {'yellow': True}) [0.3308s] [ 0%] 2025-12-04T10:35:21.0287953Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0288598Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0289056Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0289539Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0289951Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0290311Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0290818Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0291266Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0291698Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0292123Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0292556Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0293014Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0293471Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0293773Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0295360Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0295844Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0296596Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0297064Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0297772Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0298406Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0299213Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0299681Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0300398Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0300936Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0301672Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0302364Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0303078Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0303674Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0304395Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0304989Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0305742Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0306042Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0306616Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0306953Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0307417Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0308445Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0308985Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0309886Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0310508Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0311364Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0312061Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0312675Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0313363Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0313855Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0314360Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0314806Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0315194Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0315728Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0316206Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0316576Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0317332Z E1204 10:33:10.548000 93867 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0317417Z FAILED [0.3282s] [ 0%] 2025-12-04T10:35:21.0317422Z 2025-12-04T10:35:21.0317553Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.0317853Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0317958Z Traceback (most recent call last): 2025-12-04T10:35:21.0318294Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0318410Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0318848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0319133Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0319600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0319769Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0320235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0320365Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0320850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0321181Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0321653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0321787Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0322221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0322363Z return self._compile_to_module() 2025-12-04T10:35:21.0322776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0322908Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0323390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0323497Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0323912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0324107Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0324602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0324712Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0325139Z File "/tmp/tmptvikjk14/32/c32ufripxwlo6rki4djw6fc74de3sry7zb5alnkovhssl7x5mrna.py", line 51, in 2025-12-04T10:35:21.0325528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0325622Z kernel.precompile( 2025-12-04T10:35:21.0326140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0326237Z self._precompile_worker() 2025-12-04T10:35:21.0326744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0326890Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0327401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0327562Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0327940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0328142Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0328518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0328806Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0329001Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0329268Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0329369Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0329524Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0329611Z xmask = xindex < xnumel 2025-12-04T10:35:21.0329687Z x0 = xindex 2025-12-04T10:35:21.0329823Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0329919Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0329989Z ^ 2025-12-04T10:35:21.0330316Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0330324Z 2025-12-04T10:35:21.0330928Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0330982Z 2025-12-04T10:35:21.0330986Z 2025-12-04T10:35:21.0331169Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0331860Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0331866Z 2025-12-04T10:35:21.0332087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0332304Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0332391Z frames [('total', 1)] 2025-12-04T10:35:21.0332483Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0332923Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0333106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0333187Z graph_break [] 2025-12-04T10:35:21.0333461Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0333557Z Traceback (most recent call last): 2025-12-04T10:35:21.0333862Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0333966Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0334377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0334586Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0335017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0335177Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0335607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0335727Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0336230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0336504Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0336942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0337065Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0337466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0337563Z return self._compile_to_module() 2025-12-04T10:35:21.0337974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0338108Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0338548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0338649Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0339165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0343087Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0343616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0343724Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0344178Z File "/tmp/tmpb5vcbfw5/pw/cpwvrawzilgiwzwoqywcm7v4lxfs4vbircxc32axt5tqcf5jg5ns.py", line 51, in 2025-12-04T10:35:21.0344575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0344742Z kernel.precompile( 2025-12-04T10:35:21.0345215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0345315Z self._precompile_worker() 2025-12-04T10:35:21.0345838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0345987Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0346548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0346717Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0347217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0347434Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0347810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0348096Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0348305Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0348574Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0348680Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0348791Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0348881Z xmask = xindex < xnumel 2025-12-04T10:35:21.0348965Z x0 = xindex 2025-12-04T10:35:21.0349107Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0349203Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0349289Z ^ 2025-12-04T10:35:21.0349625Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0349631Z 2025-12-04T10:35:21.0350245Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0350250Z 2025-12-04T10:35:21.0350254Z 2025-12-04T10:35:21.0350436Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0351126Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0351136Z 2025-12-04T10:35:21.0351360Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0351546Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0351637Z frames [('total', 1)] 2025-12-04T10:35:21.0351732Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0352135Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0352324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0352407Z graph_break [] 2025-12-04T10:35:21.0352668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0352752Z frames [('total', 1)] 2025-12-04T10:35:21.0352847Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0353038Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0353430Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0353514Z graph_break [] 2025-12-04T10:35:21.0353637Z =================================== FAILURES =================================== 2025-12-04T10:35:21.0353910Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0354057Z Traceback (most recent call last): 2025-12-04T10:35:21.0354379Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0354482Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0354905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0355114Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0355592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0355759Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0356239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0356368Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0356824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0357096Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0357539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0357659Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0358075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0358175Z return self._compile_to_module() 2025-12-04T10:35:21.0358582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0358728Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0359174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0359282Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0359706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0359899Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0360413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0360519Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0360930Z File "/tmp/tmpj_bc2pgi/gr/cgr4gq5h7yxqi5zjnmckvmyxjq6btl52eoikluy6yoapefty7ywm.py", line 51, in 2025-12-04T10:35:21.0361329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0361417Z kernel.precompile( 2025-12-04T10:35:21.0361886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0361988Z self._precompile_worker() 2025-12-04T10:35:21.0362496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0362701Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0363213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0363374Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0363765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0363969Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0364349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0364679Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0364871Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0365154Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0365253Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0365369Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0365504Z xmask = xindex < xnumel 2025-12-04T10:35:21.0365585Z x0 = xindex 2025-12-04T10:35:21.0365730Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0365828Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0365943Z ^ 2025-12-04T10:35:21.0366275Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0366279Z 2025-12-04T10:35:21.0366888Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0366893Z 2025-12-04T10:35:21.0366896Z 2025-12-04T10:35:21.0367082Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0367767Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0367772Z 2025-12-04T10:35:21.0367998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0368188Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0368280Z frames [('total', 1)] 2025-12-04T10:35:21.0368385Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0368790Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0368981Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0369063Z graph_break [] 2025-12-04T10:35:21.0369242Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0369326Z frames [('total', 1)] 2025-12-04T10:35:21.0369435Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0369615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0370020Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0370100Z graph_break [] 2025-12-04T10:35:21.0370280Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0370374Z frames [('total', 1)] 2025-12-04T10:35:21.0370468Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0370650Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0371049Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0371134Z graph_break [] 2025-12-04T10:35:21.0371691Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.xml - 2025-12-04T10:35:21.0371900Z =========================== short test summary info ============================ 2025-12-04T10:35:21.0372582Z FAILED [0.3282s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0372858Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0372961Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0373073Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0373166Z xmask = xindex < xnumel 2025-12-04T10:35:21.0373286Z x0 = xindex 2025-12-04T10:35:21.0373430Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0373528Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0373597Z ^ 2025-12-04T10:35:21.0373938Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0373946Z 2025-12-04T10:35:21.0374589Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0374594Z 2025-12-04T10:35:21.0374598Z 2025-12-04T10:35:21.0374786Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0375474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0375520Z 2025-12-04T10:35:21.0375750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0375931Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.0376130Z ================== 1 failed, 62 deselected, 2 rerun in 2.45s =================== 2025-12-04T10:35:21.0376216Z Got exit code 1 2025-12-04T10:35:21.0376306Z Retrying single test... 2025-12-04T10:35:21.0376708Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.xml 2025-12-04T10:35:21.0376848Z ============================= test session starts ============================== 2025-12-04T10:35:21.0377138Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.0377233Z cachedir: .pytest_cache 2025-12-04T10:35:21.0377687Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.0377789Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.0377875Z configfile: pytest.ini 2025-12-04T10:35:21.0378343Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.0378527Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.0379224Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0379321Z Running 1 items in this shard 2025-12-04T10:35:21.0379325Z 2025-12-04T10:35:21.0380283Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0380931Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0381398Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0381928Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0382349Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0382716Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0383214Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0383657Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0384132Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0384561Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0384993Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0385489Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0385972Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0386364Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0387907Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0388368Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0389098Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0389538Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0390242Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0390853Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0391572Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0392000Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0392721Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0393264Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0394053Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0394749Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0395472Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0396107Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0396816Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0397443Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0398195Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0398543Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0399121Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0399430Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0399878Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0400769Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0401306Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0402055Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0402636Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0403384Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0404052Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0404576Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0405216Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0405682Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0406246Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0406679Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0407048Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0407547Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0408212Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0408558Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0409268Z E1204 10:33:20.458000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0409376Z ('RERUN', {'yellow': True}) [1.7732s] [100%] 2025-12-04T10:35:21.0410406Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0411093Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0411555Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0412035Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0412451Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0412820Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0413317Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0413759Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0414195Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0414626Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0415060Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0415525Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0416013Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0416346Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0417944Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0418412Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0419176Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0419610Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0420375Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0420984Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0421741Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0422170Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0422927Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0423465Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0424204Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0424904Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0425620Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0426207Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0426918Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0427508Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0428256Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0428558Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0429130Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0429433Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0429879Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0430809Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0431344Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0432096Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0432725Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0433470Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0434197Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0434721Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0435486Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0436113Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0436724Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0437146Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0437512Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0438016Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0438481Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0438828Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0439536Z E1204 10:33:20.823000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0439645Z ('RERUN', {'yellow': True}) [0.3319s] [100%] 2025-12-04T10:35:21.0440601Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0441246Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0441708Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0442190Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0442671Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0443042Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0443540Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0443992Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0444428Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0444898Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0445339Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0445801Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0446300Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0446606Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0448186Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0448645Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0449370Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0449902Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0450612Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0451224Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0451945Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0452373Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0453098Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0453642Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0454390Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0455144Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0455874Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0456473Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0457237Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0457833Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0458642Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0458956Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0459633Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0459951Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0460406Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0461295Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0461841Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0462600Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0463190Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0463935Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0464602Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0465126Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0465771Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0466246Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0466723Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0467194Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0467568Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0468071Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0468525Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0468911Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0469618Z E1204 10:33:21.154000 94048 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0469700Z FAILED [0.3292s] [100%] 2025-12-04T10:35:21.0469707Z 2025-12-04T10:35:21.0469827Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.0470150Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0470252Z Traceback (most recent call last): 2025-12-04T10:35:21.0470572Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0470717Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0471133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0471358Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0471793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0471966Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0472404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0472526Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0472994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0473263Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0473706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0473837Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0474245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0474350Z return self._compile_to_module() 2025-12-04T10:35:21.0474763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0474895Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0475342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0475456Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0475919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0476136Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0476633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0476744Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0477180Z File "/tmp/tmpvy7pjtjg/nl/cnlezz4bkmjgo3cdm3hubkxdvfz6nhi6w5vdckr2n4n6gqyhfvo5.py", line 51, in 2025-12-04T10:35:21.0477623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0477720Z kernel.precompile( 2025-12-04T10:35:21.0478192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0478295Z self._precompile_worker() 2025-12-04T10:35:21.0478798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0478948Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0479526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0479689Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0480076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0480280Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0480700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0480987Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0481184Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0481490Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0481601Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0481716Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0481811Z xmask = xindex < xnumel 2025-12-04T10:35:21.0481885Z x0 = xindex 2025-12-04T10:35:21.0482020Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0482124Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0482198Z ^ 2025-12-04T10:35:21.0482526Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0482531Z 2025-12-04T10:35:21.0483154Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0483159Z 2025-12-04T10:35:21.0483166Z 2025-12-04T10:35:21.0483345Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0484042Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0484049Z 2025-12-04T10:35:21.0484273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0484461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0484548Z frames [('total', 1)] 2025-12-04T10:35:21.0484641Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0485050Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0485239Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0485318Z graph_break [] 2025-12-04T10:35:21.0485601Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0485701Z Traceback (most recent call last): 2025-12-04T10:35:21.0486049Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0486174Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0486594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0486809Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0487295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0487460Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0487900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0488018Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0488477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0488788Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0489225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0489347Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0489756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0489856Z return self._compile_to_module() 2025-12-04T10:35:21.0490300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0490434Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0490916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0491022Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0491445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0491638Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0492134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0492237Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0492673Z File "/tmp/tmpft9gncmr/ky/ckykqdr43bjcvvjgmkbi3r4vji2rl6yinuvnqvudoddwhf3lctm2.py", line 51, in 2025-12-04T10:35:21.0493068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0493157Z kernel.precompile( 2025-12-04T10:35:21.0493631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0493727Z self._precompile_worker() 2025-12-04T10:35:21.0494231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0494378Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0494889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0495054Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0495432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0495638Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0496005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0496295Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0496488Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0496754Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0496852Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0496966Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0497094Z xmask = xindex < xnumel 2025-12-04T10:35:21.0497172Z x0 = xindex 2025-12-04T10:35:21.0497308Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0497409Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0497481Z ^ 2025-12-04T10:35:21.0497806Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0497813Z 2025-12-04T10:35:21.0498422Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0498473Z 2025-12-04T10:35:21.0498477Z 2025-12-04T10:35:21.0498660Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0499401Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0499409Z 2025-12-04T10:35:21.0499630Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0499850Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0499933Z frames [('total', 1)] 2025-12-04T10:35:21.0500024Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0500428Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0500653Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0500730Z graph_break [] 2025-12-04T10:35:21.0500908Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0500988Z frames [('total', 1)] 2025-12-04T10:35:21.0501077Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0501259Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0501652Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0501732Z graph_break [] 2025-12-04T10:35:21.0501851Z =================================== FAILURES =================================== 2025-12-04T10:35:21.0502122Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0502226Z Traceback (most recent call last): 2025-12-04T10:35:21.0502541Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0502640Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0503052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0503256Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0503690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0503849Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0504280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0504399Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0504847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0505117Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0505557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0505676Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0506126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0506271Z return self._compile_to_module() 2025-12-04T10:35:21.0506682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0506822Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0507257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0507366Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0507939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0508210Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0508707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0508808Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0509240Z File "/tmp/tmp9i36wj3s/in/cintglatnz3hxbsk7lef7gw5ajiiljp2an43jw3kh3lgbskpyppc.py", line 51, in 2025-12-04T10:35:21.0509689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0509776Z kernel.precompile( 2025-12-04T10:35:21.0510246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0510393Z self._precompile_worker() 2025-12-04T10:35:21.0510897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0511046Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0511547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0511721Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0512098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0512299Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0512674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0512950Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0513144Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0513410Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0513508Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0513618Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0513703Z xmask = xindex < xnumel 2025-12-04T10:35:21.0513773Z x0 = xindex 2025-12-04T10:35:21.0513920Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0514014Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0514083Z ^ 2025-12-04T10:35:21.0514412Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0514417Z 2025-12-04T10:35:21.0515021Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0515028Z 2025-12-04T10:35:21.0515032Z 2025-12-04T10:35:21.0515218Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0515916Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0515921Z 2025-12-04T10:35:21.0516180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0516443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0516528Z frames [('total', 1)] 2025-12-04T10:35:21.0516621Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0517019Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0517201Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0517284Z graph_break [] 2025-12-04T10:35:21.0517460Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0517591Z frames [('total', 1)] 2025-12-04T10:35:21.0517683Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0517863Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0518255Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0518330Z graph_break [] 2025-12-04T10:35:21.0518504Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0518587Z frames [('total', 1)] 2025-12-04T10:35:21.0518796Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0518981Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0519377Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0519494Z graph_break [] 2025-12-04T10:35:21.0520051Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.xml - 2025-12-04T10:35:21.0520193Z =========================== short test summary info ============================ 2025-12-04T10:35:21.0520864Z FAILED [0.3292s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0521139Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0521239Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0521355Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0521440Z xmask = xindex < xnumel 2025-12-04T10:35:21.0521511Z x0 = xindex 2025-12-04T10:35:21.0521647Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0521749Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0521822Z ^ 2025-12-04T10:35:21.0522148Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0522156Z 2025-12-04T10:35:21.0522758Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0522763Z 2025-12-04T10:35:21.0522767Z 2025-12-04T10:35:21.0522951Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0523643Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0523648Z 2025-12-04T10:35:21.0523872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0524020Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.0524184Z ================== 1 failed, 187 deselected, 2 rerun in 2.47s ================== 2025-12-04T10:35:21.0524265Z Got exit code 1 2025-12-04T10:35:21.0524348Z Retrying single test... 2025-12-04T10:35:21.0524751Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.xml 2025-12-04T10:35:21.0524883Z ============================= test session starts ============================== 2025-12-04T10:35:21.0525220Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.0525311Z cachedir: .pytest_cache 2025-12-04T10:35:21.0525761Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.0525862Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.0525950Z configfile: pytest.ini 2025-12-04T10:35:21.0526410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.0526596Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.0527256Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0527353Z Running 1 items in this shard 2025-12-04T10:35:21.0527357Z 2025-12-04T10:35:21.0528352Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0528998Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0529500Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0529971Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0530382Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0530748Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0531246Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0531689Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0532113Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0532537Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0532962Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0533419Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0533877Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0534174Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0535707Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0536167Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0536942Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0537375Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0538074Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0538716Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0539478Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0539903Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0540660Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0541238Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0541971Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0542664Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0543380Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0543967Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0544690Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0545268Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0546022Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0546327Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0546900Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0547198Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0547645Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0548573Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0549189Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0549940Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0550517Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0551302Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0551957Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0552511Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0553156Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0553677Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0554148Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0554568Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0554927Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0555432Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0555870Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0556215Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0556910Z E1204 10:33:30.963000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0557016Z ('RERUN', {'yellow': True}) [1.7634s] [100%] 2025-12-04T10:35:21.0557965Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0558600Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0559059Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0559529Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0559947Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0560309Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0560852Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0561295Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0561717Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0562145Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0562610Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0563066Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0563525Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0563856Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0565391Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0565888Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0566617Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0567046Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0567743Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0568346Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0569063Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0569491Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0570201Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0570734Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0571469Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0572160Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0572917Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0573507Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0574221Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0574842Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0575590Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0575928Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0576554Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0576853Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0577343Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0578230Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0578759Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0579556Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0580131Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0580874Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0581529Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0582046Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0582689Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0583143Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0583613Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0584032Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0584433Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0584934Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0585372Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0585716Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0586466Z E1204 10:33:31.327000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0586613Z ('RERUN', {'yellow': True}) [0.3311s] [100%] 2025-12-04T10:35:21.0587568Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0588241Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0588702Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0589208Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0589621Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0589985Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0590481Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0590927Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0591347Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0591774Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0592197Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0592658Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0593125Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0593418Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0594954Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0595408Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0596180Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0596609Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0597312Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0597921Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0598678Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0599106Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0599884Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0600418Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0601190Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0601880Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0602593Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0603183Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0603898Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0604478Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0605224Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0605527Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0606149Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0606444Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0606891Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0607979Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0608652Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0609414Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0609988Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0610731Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0611447Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0611967Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0612670Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0613124Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0613650Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0614072Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0614426Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0614925Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0615363Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0615704Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0616452Z E1204 10:33:31.658000 94229 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0616533Z FAILED [0.3301s] [100%] 2025-12-04T10:35:21.0616538Z 2025-12-04T10:35:21.0616657Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.0616931Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0617030Z Traceback (most recent call last): 2025-12-04T10:35:21.0617346Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0617448Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0617860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0618071Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0618504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0618671Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0619152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0619275Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0619775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0620048Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0620493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0620612Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0621016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0621118Z return self._compile_to_module() 2025-12-04T10:35:21.0621568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0621711Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0622144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0622251Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0622671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0622903Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0623398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0623543Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0624045Z File "/tmp/tmpzukadt1x/xo/cxoxazj6pg7amgxbqsf4t4bkpb4hlh74aydvwarqei67mggxclsd.py", line 51, in 2025-12-04T10:35:21.0624604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0625447Z kernel.precompile( 2025-12-04T10:35:21.0626263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0627173Z self._precompile_worker() 2025-12-04T10:35:21.0628163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0638133Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0639299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0640506Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0641397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0642456Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0643477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0644662Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0645442Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0646081Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0646571Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0646896Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0647204Z xmask = xindex < xnumel 2025-12-04T10:35:21.0647447Z x0 = xindex 2025-12-04T10:35:21.0647710Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0648056Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0648317Z ^ 2025-12-04T10:35:21.0648766Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0649214Z 2025-12-04T10:35:21.0649929Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0650663Z 2025-12-04T10:35:21.0650667Z 2025-12-04T10:35:21.0650851Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0651842Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0652657Z 2025-12-04T10:35:21.0652884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0653417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0653850Z frames [('total', 1)] 2025-12-04T10:35:21.0654094Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0654675Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0655377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0655766Z graph_break [] 2025-12-04T10:35:21.0656226Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0656775Z Traceback (most recent call last): 2025-12-04T10:35:21.0657285Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0657836Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0658520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0659347Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0660121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0660847Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0661566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0662239Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0662926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0663780Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0664629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0665314Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0665961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0666585Z return self._compile_to_module() 2025-12-04T10:35:21.0667194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0667872Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0668561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0669224Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0669850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0670588Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0671405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0672136Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0672775Z File "/tmp/tmpcjx16ceo/fl/cflmujv3ks7wts63fowvlti4m252p46khmlznsr44g7jcvswctjo.py", line 51, in 2025-12-04T10:35:21.0673777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0674384Z kernel.precompile( 2025-12-04T10:35:21.0675005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0675696Z self._precompile_worker() 2025-12-04T10:35:21.0676391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0677171Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0677929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0678797Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0679464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0680177Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0680874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0681700Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0682305Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0682921Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0683410Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0683731Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0684045Z xmask = xindex < xnumel 2025-12-04T10:35:21.0684275Z x0 = xindex 2025-12-04T10:35:21.0684543Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0684884Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0685139Z ^ 2025-12-04T10:35:21.0685589Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0686067Z 2025-12-04T10:35:21.0686705Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0687428Z 2025-12-04T10:35:21.0687432Z 2025-12-04T10:35:21.0687616Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0688604Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0689426Z 2025-12-04T10:35:21.0689654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0690194Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0690584Z frames [('total', 1)] 2025-12-04T10:35:21.0690821Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0691404Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0692121Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0692506Z graph_break [] 2025-12-04T10:35:21.0692816Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0693202Z frames [('total', 1)] 2025-12-04T10:35:21.0693448Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0693806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0694506Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0695103Z graph_break [] 2025-12-04T10:35:21.0695352Z =================================== FAILURES =================================== 2025-12-04T10:35:21.0695951Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T10:35:21.0696483Z Traceback (most recent call last): 2025-12-04T10:35:21.0696996Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0697534Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0698159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0698909Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0699712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0700473Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0701179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0701862Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0702549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0703434Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0704275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0705002Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0705641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0706272Z return self._compile_to_module() 2025-12-04T10:35:21.0706886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0707549Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0708504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0709181Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0709836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0710563Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0711378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0712120Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0712786Z File "/tmp/tmpooxi8a7g/a7/ca7zyzfbflizizqma2de2nvotgrwa3o2dhuxblteripex2n33rzd.py", line 51, in 2025-12-04T10:35:21.0713752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0714367Z kernel.precompile( 2025-12-04T10:35:21.0715008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0715699Z self._precompile_worker() 2025-12-04T10:35:21.0716403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0717183Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0717958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0718746Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0719421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0720124Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0720934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0721714Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0722329Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0722916Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0723402Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0723731Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0724050Z xmask = xindex < xnumel 2025-12-04T10:35:21.0724291Z x0 = xindex 2025-12-04T10:35:21.0724620Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0724977Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0725244Z ^ 2025-12-04T10:35:21.0725702Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0726207Z 2025-12-04T10:35:21.0726820Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0727545Z 2025-12-04T10:35:21.0727610Z 2025-12-04T10:35:21.0727792Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0728770Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0729624Z 2025-12-04T10:35:21.0729853Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0730382Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0730764Z frames [('total', 1)] 2025-12-04T10:35:21.0731006Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0731576Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0732278Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0732660Z graph_break [] 2025-12-04T10:35:21.0732962Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0733378Z frames [('total', 1)] 2025-12-04T10:35:21.0733711Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0734192Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0735001Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0735597Z graph_break [] 2025-12-04T10:35:21.0735916Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0736326Z frames [('total', 1)] 2025-12-04T10:35:21.0736569Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0736929Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0737616Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0738203Z graph_break [] 2025-12-04T10:35:21.0738892Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.xml - 2025-12-04T10:35:21.0739762Z =========================== short test summary info ============================ 2025-12-04T10:35:21.0740712Z FAILED [0.3301s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0741770Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0742265Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0742589Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0742897Z xmask = xindex < xnumel 2025-12-04T10:35:21.0743210Z x0 = xindex 2025-12-04T10:35:21.0743469Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T10:35:21.0743812Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0744086Z ^ 2025-12-04T10:35:21.0744536Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0744989Z 2025-12-04T10:35:21.0745602Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0746326Z 2025-12-04T10:35:21.0746376Z 2025-12-04T10:35:21.0746558Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0747650Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0748460Z 2025-12-04T10:35:21.0748691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0749180Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.0749652Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ================== 2025-12-04T10:35:21.0750015Z Got exit code 1 2025-12-04T10:35:21.0750616Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T10:35:21.0751589Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:21.0752457Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.xml 2025-12-04T10:35:21.0753112Z ============================= test session starts ============================== 2025-12-04T10:35:21.0753652Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.0754156Z cachedir: .pytest_cache 2025-12-04T10:35:21.0754749Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.0755413Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.0755698Z configfile: pytest.ini 2025-12-04T10:35:21.0756343Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.0757107Z collecting ... collected 188 items / 63 deselected / 125 selected 2025-12-04T10:35:21.0757521Z stepcurrent: skipping 63 already run items. 2025-12-04T10:35:21.0757831Z Running 125 items in this shard 2025-12-04T10:35:21.0758005Z 2025-12-04T10:35:21.0758946Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0760644Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0761850Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0762887Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0763898Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0764784Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0765802Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.0766848Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0767829Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0768795Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0769771Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0770809Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0771840Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0772715Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0774709Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0776833Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0778130Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0779440Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0780685Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0782101Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0783532Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0784786Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0786039Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0787399Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0788782Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0790332Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0791895Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0793315Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0794727Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0796182Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0797751Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0798914Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0799937Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0800923Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0801779Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0803257Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0804788Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0806231Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0807667Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0809339Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0810840Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0812124Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0813398Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0814605Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0815655Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0816692Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0817579Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0818596Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.0819657Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0820557Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0821714Z E1204 10:33:41.437000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0822630Z ('RERUN', {'yellow': True}) [1.7683s] [ 0%] 2025-12-04T10:35:21.0823836Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0825503Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0826815Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0827852Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0828904Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0829793Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0830714Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.0831724Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0832791Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0833764Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0834723Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0835721Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0836806Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0837678Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0839627Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0841703Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0842989Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0844292Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0845538Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0846944Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0848370Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0849670Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0850922Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0852322Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0853708Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0855316Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0856898Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0858313Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0859770Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0861170Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0862610Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0863768Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0864757Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0865739Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0866640Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0868081Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0869610Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0871051Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0872487Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0873909Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0875417Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0876738Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0878009Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0879252Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0880289Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0881328Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0882218Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0883144Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.0884156Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0885053Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0886264Z E1204 10:33:41.796000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0887181Z ('RERUN', {'yellow': True}) [0.3264s] [ 0%] 2025-12-04T10:35:21.0888316Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.0889987Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0891197Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0892237Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0893234Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0894121Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0895053Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.0896102Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0897136Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.0898104Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.0899139Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.0900143Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.0901174Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.0902092Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0904081Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.0906202Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0907487Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0908980Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0910226Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.0911646Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.0913079Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.0914347Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.0915600Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.0916956Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.0918347Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.0919891Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.0921416Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.0922908Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.0924325Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.0925739Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.0927226Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0928444Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0929433Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.0930417Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.0931332Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.0932779Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0934358Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0935751Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0937179Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0938610Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0940255Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0941632Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.0942998Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0944287Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0945399Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0946518Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.0947472Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.0948461Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.0949544Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0950562Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.0951804Z E1204 10:33:42.126000 94410 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0952754Z FAILED [0.3286s] [ 0%] 2025-12-04T10:35:21.0952912Z 2025-12-04T10:35:21.0953042Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.0953578Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.0954132Z Traceback (most recent call last): 2025-12-04T10:35:21.0954669Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0955233Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0955899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0956678Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0957552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.0958270Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.0958974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.0959680Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.0965593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.0966526Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.0967431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.0968124Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.0968788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.0969422Z return self._compile_to_module() 2025-12-04T10:35:21.0970033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.0970705Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.0971403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.0972079Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.0972714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.0973447Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.0974269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.0974993Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.0975645Z File "/tmp/tmptdtmccgn/ud/cudjexpqa6lgej5xcbof6spokj54sh2qfgizgsuucr3igvk34tb7.py", line 51, in 2025-12-04T10:35:21.0976598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.0977212Z kernel.precompile( 2025-12-04T10:35:21.0977834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.0978530Z self._precompile_worker() 2025-12-04T10:35:21.0979288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.0980069Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.0980902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.0981701Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.0982367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.0983076Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.0983767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.0984590Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.0985197Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.0985768Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.0986264Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.0986584Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.0986899Z xmask = xindex < xnumel 2025-12-04T10:35:21.0987131Z x0 = xindex 2025-12-04T10:35:21.0987402Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.0987707Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.0987960Z ^ 2025-12-04T10:35:21.0988403Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.0988893Z 2025-12-04T10:35:21.0989512Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.0990235Z 2025-12-04T10:35:21.0990239Z 2025-12-04T10:35:21.0990430Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.0991397Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.0992194Z 2025-12-04T10:35:21.0992421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.0992948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.0993331Z frames [('total', 1)] 2025-12-04T10:35:21.0993572Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.0994155Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.0994870Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.0995248Z graph_break [] 2025-12-04T10:35:21.0995648Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.0996193Z Traceback (most recent call last): 2025-12-04T10:35:21.0996710Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.0997252Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.0997894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.0998647Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.0999404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1000132Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1000844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1001527Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1002223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1003119Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1003962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1004647Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1005288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1005919Z return self._compile_to_module() 2025-12-04T10:35:21.1006535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1007245Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1008188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1008872Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1009520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1010246Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1011136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1011868Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1012562Z File "/tmp/tmp3z_eqw_z/hr/chroy2zbvb2tsvxktfphex7ndibrb3ibndqekzpkqncot5kjblwz.py", line 51, in 2025-12-04T10:35:21.1013484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1014090Z kernel.precompile( 2025-12-04T10:35:21.1014723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1015405Z self._precompile_worker() 2025-12-04T10:35:21.1016158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1016942Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1017713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1018506Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1019230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1019993Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1020741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1021563Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1022204Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1022830Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1023354Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1023680Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1023999Z xmask = xindex < xnumel 2025-12-04T10:35:21.1024253Z x0 = xindex 2025-12-04T10:35:21.1024477Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1024784Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1025042Z ^ 2025-12-04T10:35:21.1025492Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1025947Z 2025-12-04T10:35:21.1026558Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1027290Z 2025-12-04T10:35:21.1027361Z 2025-12-04T10:35:21.1027545Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1028517Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1029303Z 2025-12-04T10:35:21.1029532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1030074Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1030472Z frames [('total', 1)] 2025-12-04T10:35:21.1030780Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1031352Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1032058Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1032445Z graph_break [] 2025-12-04T10:35:21.1032750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1033143Z frames [('total', 1)] 2025-12-04T10:35:21.1033387Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1033823Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1034727Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1035498Z graph_break [] 2025-12-04T10:35:21.1035752Z =================================== FAILURES =================================== 2025-12-04T10:35:21.1036259Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1036751Z Traceback (most recent call last): 2025-12-04T10:35:21.1037270Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1037820Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1038438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1039173Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1039934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1040639Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1041347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1042022Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1042710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1043548Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1044392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1045076Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1045741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1046397Z return self._compile_to_module() 2025-12-04T10:35:21.1047005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1047671Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1048452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1049117Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1049756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1050545Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1051353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1052077Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1052716Z File "/tmp/tmpmxeh4jqx/67/c676vt6qpogoiequijtsoelb2wziv53uqvl24wrwfsormxxhr24i.py", line 51, in 2025-12-04T10:35:21.1053668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1054262Z kernel.precompile( 2025-12-04T10:35:21.1054785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1054884Z self._precompile_worker() 2025-12-04T10:35:21.1055397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1055563Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1056138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1056317Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1056700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1057556Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1057941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1058231Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1058432Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1058711Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1058817Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1058940Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1059084Z xmask = xindex < xnumel 2025-12-04T10:35:21.1059163Z x0 = xindex 2025-12-04T10:35:21.1059274Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1059369Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1059443Z ^ 2025-12-04T10:35:21.1059776Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1059781Z 2025-12-04T10:35:21.1060391Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1060396Z 2025-12-04T10:35:21.1060400Z 2025-12-04T10:35:21.1060592Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1061272Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1061277Z 2025-12-04T10:35:21.1061516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1061700Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1061790Z frames [('total', 1)] 2025-12-04T10:35:21.1061898Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1062297Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1062483Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1062572Z graph_break [] 2025-12-04T10:35:21.1062756Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1062846Z frames [('total', 1)] 2025-12-04T10:35:21.1063001Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1063186Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1063587Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1063670Z graph_break [] 2025-12-04T10:35:21.1063846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1063941Z frames [('total', 1)] 2025-12-04T10:35:21.1064032Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1064213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1064656Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1064736Z graph_break [] 2025-12-04T10:35:21.1065301Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.xml - 2025-12-04T10:35:21.1065444Z =========================== short test summary info ============================ 2025-12-04T10:35:21.1066138Z FAILED [0.3286s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1066412Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1066555Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1066680Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1066766Z xmask = xindex < xnumel 2025-12-04T10:35:21.1066845Z x0 = xindex 2025-12-04T10:35:21.1066949Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1067046Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1067117Z ^ 2025-12-04T10:35:21.1067448Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1067453Z 2025-12-04T10:35:21.1068062Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1068069Z 2025-12-04T10:35:21.1068073Z 2025-12-04T10:35:21.1068264Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1068931Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1068938Z 2025-12-04T10:35:21.1069174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1069330Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.1069499Z ================== 1 failed, 63 deselected, 2 rerun in 2.46s =================== 2025-12-04T10:35:21.1069591Z Got exit code 1 2025-12-04T10:35:21.1069681Z Retrying single test... 2025-12-04T10:35:21.1070085Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.xml 2025-12-04T10:35:21.1070228Z ============================= test session starts ============================== 2025-12-04T10:35:21.1070526Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.1070624Z cachedir: .pytest_cache 2025-12-04T10:35:21.1071076Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.1071177Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.1071273Z configfile: pytest.ini 2025-12-04T10:35:21.1071735Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.1071925Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.1072579Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1072676Z Running 1 items in this shard 2025-12-04T10:35:21.1072683Z 2025-12-04T10:35:21.1073623Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1074270Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1074783Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1075258Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1075679Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1076089Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1076545Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1077029Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1077456Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1077883Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1078320Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1078780Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1079251Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1079552Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1081092Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1081554Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1082282Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1082713Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1083419Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1084072Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1084794Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1085226Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1085943Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1086521Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1087257Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1087989Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1088705Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1089333Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1090046Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1090623Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1091374Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1091678Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1092249Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1092551Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1093004Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1093899Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1094427Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1095176Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1095753Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1096539Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1097197Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1097714Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1098355Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1098881Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1099411Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1099828Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1100231Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1100695Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1101176Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1101523Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1102226Z E1204 10:33:52.019000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1102332Z ('RERUN', {'yellow': True}) [1.7859s] [100%] 2025-12-04T10:35:21.1103268Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1103906Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1104366Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1104838Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1105250Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1105612Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1106067Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1106507Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1106935Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1107362Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1108009Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1108584Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1109058Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1109357Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1110894Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1111408Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1112191Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1112623Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1113384Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1113985Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1114705Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1115136Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1115846Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1116384Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1117121Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1117818Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1118533Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1119122Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1119838Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1120539Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1121288Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1121590Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1122166Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1122511Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1122961Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1123843Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1124416Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1125162Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1125779Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1126527Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1127179Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1127701Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1128349Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1128807Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1129275Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1129693Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1130054Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1130513Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1130953Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1131295Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1131996Z E1204 10:33:52.381000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1132101Z ('RERUN', {'yellow': True}) [0.3278s] [100%] 2025-12-04T10:35:21.1133076Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1133715Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1134172Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1134685Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1135097Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1135467Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1135980Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1136453Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1136917Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1137344Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1137768Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1138226Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1138686Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1138981Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1140654Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1141115Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1141842Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1142270Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1142971Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1143571Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1144368Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1144799Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1145515Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1146101Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1146872Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1147568Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1148319Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1148905Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1149657Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1150237Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1150988Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1151291Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1151863Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1152166Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1152614Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1153495Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1154027Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1154774Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1155351Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1156096Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1156793Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1157314Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1157951Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1158411Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1158922Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1159344Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1159702Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1160208Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1160645Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1161029Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1161724Z E1204 10:33:52.709000 94591 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1161810Z FAILED [0.3262s] [100%] 2025-12-04T10:35:21.1161815Z 2025-12-04T10:35:21.1161937Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.1162205Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1162305Z Traceback (most recent call last): 2025-12-04T10:35:21.1162626Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1162736Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1163146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1163364Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1163797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1163959Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1164390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1164508Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1164962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1165235Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1165691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1165835Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1166261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1166362Z return self._compile_to_module() 2025-12-04T10:35:21.1166773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1166904Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1167393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1167502Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1167927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1168121Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1168619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1168768Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1169173Z File "/tmp/tmpfd_whw1o/4l/c4lv633hnzo7unmp7eoiv3266ff7xyvpnmehefvqxrsess6rycdf.py", line 51, in 2025-12-04T10:35:21.1169572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1169663Z kernel.precompile( 2025-12-04T10:35:21.1170131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1170276Z self._precompile_worker() 2025-12-04T10:35:21.1170781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1170969Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1171474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1171639Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1172017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1172218Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1172590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1172874Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1173068Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1173334Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1173438Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1173549Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1173635Z xmask = xindex < xnumel 2025-12-04T10:35:21.1173712Z x0 = xindex 2025-12-04T10:35:21.1173809Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1173904Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1173972Z ^ 2025-12-04T10:35:21.1174303Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1174310Z 2025-12-04T10:35:21.1174919Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1174928Z 2025-12-04T10:35:21.1174932Z 2025-12-04T10:35:21.1175116Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1175795Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1175803Z 2025-12-04T10:35:21.1176028Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1176212Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1176299Z frames [('total', 1)] 2025-12-04T10:35:21.1176390Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1176838Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1177024Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1177100Z graph_break [] 2025-12-04T10:35:21.1177371Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1177469Z Traceback (most recent call last): 2025-12-04T10:35:21.1177787Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1177889Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1178303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1178554Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1178988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1179209Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1179639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1179809Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1180269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1180607Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1181045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1181171Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1181580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1181681Z return self._compile_to_module() 2025-12-04T10:35:21.1182090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1182226Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1182665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1182770Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1183203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1183399Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1183893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1184003Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1184446Z File "/tmp/tmpncyqbmyv/ip/cip6fbymc5rcgxth5iwbypyzci4ugmmybsjbyrepen7saybti77z.py", line 51, in 2025-12-04T10:35:21.1184836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1184930Z kernel.precompile( 2025-12-04T10:35:21.1185400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1185496Z self._precompile_worker() 2025-12-04T10:35:21.1186000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1186148Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1186652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1186815Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1187251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1187455Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1187833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1188117Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1188318Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1188583Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1188727Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1188843Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1188930Z xmask = xindex < xnumel 2025-12-04T10:35:21.1189003Z x0 = xindex 2025-12-04T10:35:21.1189101Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1189203Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1189275Z ^ 2025-12-04T10:35:21.1189602Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1189648Z 2025-12-04T10:35:21.1190263Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1190306Z 2025-12-04T10:35:21.1190310Z 2025-12-04T10:35:21.1190490Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1191172Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1191179Z 2025-12-04T10:35:21.1191402Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1191588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1191671Z frames [('total', 1)] 2025-12-04T10:35:21.1191766Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1192170Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1192354Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1192438Z graph_break [] 2025-12-04T10:35:21.1192616Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1192696Z frames [('total', 1)] 2025-12-04T10:35:21.1192790Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1192972Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1193363Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1193447Z graph_break [] 2025-12-04T10:35:21.1193567Z =================================== FAILURES =================================== 2025-12-04T10:35:21.1193829Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1193933Z Traceback (most recent call last): 2025-12-04T10:35:21.1194243Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1194347Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1194762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1194969Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1195406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1195565Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1196054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1196190Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1196668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1196941Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1197383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1197503Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1197953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1198051Z return self._compile_to_module() 2025-12-04T10:35:21.1198461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1198606Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1199044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1199199Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1199617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1199873Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1200370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1200478Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1200901Z File "/tmp/tmp6y97_p_5/cn/ccnytioj4573pvampbf3surjtlxtfpnbqbakpi2at3gdi7stnwrg.py", line 51, in 2025-12-04T10:35:21.1201291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1201377Z kernel.precompile( 2025-12-04T10:35:21.1201856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1201955Z self._precompile_worker() 2025-12-04T10:35:21.1202458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1202613Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1203114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1203285Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1203666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1203871Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1204250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1204531Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1204727Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1204990Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1205091Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1205210Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1205299Z xmask = xindex < xnumel 2025-12-04T10:35:21.1205372Z x0 = xindex 2025-12-04T10:35:21.1205476Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1205570Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1205638Z ^ 2025-12-04T10:35:21.1206012Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1206017Z 2025-12-04T10:35:21.1206624Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1206630Z 2025-12-04T10:35:21.1206634Z 2025-12-04T10:35:21.1206819Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1207496Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1207541Z 2025-12-04T10:35:21.1207996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1208202Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1208286Z frames [('total', 1)] 2025-12-04T10:35:21.1208379Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1208784Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1209056Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1209139Z graph_break [] 2025-12-04T10:35:21.1209315Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1209396Z frames [('total', 1)] 2025-12-04T10:35:21.1209544Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1209724Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1210125Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1210202Z graph_break [] 2025-12-04T10:35:21.1210376Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1210459Z frames [('total', 1)] 2025-12-04T10:35:21.1210548Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1210737Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1211128Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1211206Z graph_break [] 2025-12-04T10:35:21.1211767Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.xml - 2025-12-04T10:35:21.1211910Z =========================== short test summary info ============================ 2025-12-04T10:35:21.1212561Z FAILED [0.3262s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1212837Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1212937Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1213056Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1213142Z xmask = xindex < xnumel 2025-12-04T10:35:21.1213212Z x0 = xindex 2025-12-04T10:35:21.1213312Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1213406Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1213475Z ^ 2025-12-04T10:35:21.1213804Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1213812Z 2025-12-04T10:35:21.1214414Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1214421Z 2025-12-04T10:35:21.1214425Z 2025-12-04T10:35:21.1214608Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1215349Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1215356Z 2025-12-04T10:35:21.1215581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1215757Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.1215946Z ================== 1 failed, 187 deselected, 2 rerun in 2.47s ================== 2025-12-04T10:35:21.1216025Z Got exit code 1 2025-12-04T10:35:21.1216111Z Retrying single test... 2025-12-04T10:35:21.1216511Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.xml 2025-12-04T10:35:21.1216707Z ============================= test session starts ============================== 2025-12-04T10:35:21.1216998Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.1217086Z cachedir: .pytest_cache 2025-12-04T10:35:21.1217531Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.1217636Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.1217727Z configfile: pytest.ini 2025-12-04T10:35:21.1218248Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.1218436Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.1219126Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1219221Z Running 1 items in this shard 2025-12-04T10:35:21.1219225Z 2025-12-04T10:35:21.1220163Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1220805Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1221268Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1221736Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1222155Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1222517Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1222972Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1223420Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1223845Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1228655Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1229129Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1229606Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1230084Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1230462Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1232026Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1232488Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1233264Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1233705Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1234459Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1235072Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1235844Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1236286Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1237009Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1237554Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1238302Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1239004Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1239733Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1240337Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1241066Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1241657Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1242418Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1242776Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1243357Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1243673Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1244128Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1245025Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1245603Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1246363Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1246989Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1247743Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1248451Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1248977Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1249631Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1250099Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1250575Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1251006Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1251374Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1251844Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1252293Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1252648Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1253355Z E1204 10:34:02.602000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1253471Z ('RERUN', {'yellow': True}) [1.7878s] [100%] 2025-12-04T10:35:21.1254415Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1255105Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1255582Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1256059Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1256480Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1256852Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1257357Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1257807Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1258235Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1258704Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1259212Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1259726Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1260197Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1260507Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1262054Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1262526Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1263262Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1263702Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1264414Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1265028Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1265758Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1266254Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1267013Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1267562Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1268308Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1269009Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1269804Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1270401Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1271170Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1271755Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1272555Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1272868Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1273448Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1273760Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1274209Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1275098Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1275636Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1276392Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1276975Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1277724Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1278389Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1278918Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1279614Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1280082Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1280557Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1280983Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1281347Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1281859Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1282314Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1282670Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1283419Z E1204 10:34:02.965000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1283569Z ('RERUN', {'yellow': True}) [0.3297s] [100%] 2025-12-04T10:35:21.1284515Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1285163Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1285633Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1286113Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1286533Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1286907Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1287364Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1287821Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1288254Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1288692Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1289129Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1289591Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1290064Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1290371Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1291952Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1292417Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1293154Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1293642Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1294355Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1295003Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1295752Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1296341Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1297063Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1297607Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1298359Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1299108Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1299842Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1300440Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1301174Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1301758Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1302513Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1302826Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1303402Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1303757Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1304216Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1305103Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1305648Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1306500Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1307086Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1308164Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1308841Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1309425Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1310073Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1310541Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1311022Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1311444Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1311812Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1312286Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1312729Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1313080Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1313790Z E1204 10:34:03.295000 94772 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1313881Z FAILED [0.3283s] [100%] 2025-12-04T10:35:21.1313886Z 2025-12-04T10:35:21.1314025Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.1314311Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1314416Z Traceback (most recent call last): 2025-12-04T10:35:21.1314744Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1314861Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1315288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1315612Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1316061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1316240Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1316678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1316810Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1317279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1317616Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1318066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1318198Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1318611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1318764Z return self._compile_to_module() 2025-12-04T10:35:21.1319179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1319366Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1319827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1319949Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1320377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1320581Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1321093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1321219Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1321659Z File "/tmp/tmpbxy29ftn/ja/cjafqmc53vipi3ljoyzauf6nmdu7kws535ie5ara563rhupojgd2.py", line 51, in 2025-12-04T10:35:21.1322066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1322165Z kernel.precompile( 2025-12-04T10:35:21.1322639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1322747Z self._precompile_worker() 2025-12-04T10:35:21.1323257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1323411Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1323935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1324110Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1324503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1324715Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1325095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1325395Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1328254Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1328608Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1328757Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1328956Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1329058Z xmask = xindex < xnumel 2025-12-04T10:35:21.1329138Z x0 = xindex 2025-12-04T10:35:21.1329241Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1329345Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1329420Z ^ 2025-12-04T10:35:21.1329749Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1329758Z 2025-12-04T10:35:21.1330373Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1330379Z 2025-12-04T10:35:21.1330404Z 2025-12-04T10:35:21.1330587Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1331275Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1331280Z 2025-12-04T10:35:21.1331511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1331740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1331841Z frames [('total', 1)] 2025-12-04T10:35:21.1331936Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1332388Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1332579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1332660Z graph_break [] 2025-12-04T10:35:21.1332933Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1333041Z Traceback (most recent call last): 2025-12-04T10:35:21.1333355Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1333467Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1333882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1334100Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1334540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1334704Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1335147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1335267Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1335724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1336007Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1336448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1336575Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1336981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1337085Z return self._compile_to_module() 2025-12-04T10:35:21.1337497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1337636Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1338087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1338285Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1338755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1338955Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1339521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1339630Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1340059Z File "/tmp/tmprxf_h57j/ma/cma4karkspr5tydudgs6yi6ovkwhsh4sfaofr7da5a74lbo5nhku.py", line 51, in 2025-12-04T10:35:21.1340455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1340552Z kernel.precompile( 2025-12-04T10:35:21.1341027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1341122Z self._precompile_worker() 2025-12-04T10:35:21.1341635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1341892Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1342489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1342654Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1343072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1343283Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1343654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1343939Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1344144Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1344414Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1344520Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1344637Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1344725Z xmask = xindex < xnumel 2025-12-04T10:35:21.1344806Z x0 = xindex 2025-12-04T10:35:21.1344907Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1345005Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1345088Z ^ 2025-12-04T10:35:21.1345418Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1345423Z 2025-12-04T10:35:21.1346038Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1346045Z 2025-12-04T10:35:21.1346049Z 2025-12-04T10:35:21.1346232Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1346910Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1346923Z 2025-12-04T10:35:21.1347147Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1347336Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1347429Z frames [('total', 1)] 2025-12-04T10:35:21.1347521Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1347920Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1348184Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1348266Z graph_break [] 2025-12-04T10:35:21.1348492Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1348572Z frames [('total', 1)] 2025-12-04T10:35:21.1348663Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1348856Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1349250Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1349329Z graph_break [] 2025-12-04T10:35:21.1349458Z =================================== FAILURES =================================== 2025-12-04T10:35:21.1349730Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1349831Z Traceback (most recent call last): 2025-12-04T10:35:21.1350150Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1350258Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1350685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1350895Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1351371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1351545Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1351979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1352172Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1352626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1352901Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1353352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1353476Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1353886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1353996Z return self._compile_to_module() 2025-12-04T10:35:21.1354406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1354555Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1354992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1355098Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1355526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1355748Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1356283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1356387Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1356823Z File "/tmp/tmpvuf9ihr7/so/csolmffcaq42dab5nfjqsymkpfghw2wnevyext463tbcu65pheq6.py", line 51, in 2025-12-04T10:35:21.1357223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1357319Z kernel.precompile( 2025-12-04T10:35:21.1357791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1357942Z self._precompile_worker() 2025-12-04T10:35:21.1358451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1358647Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1359156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1359325Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1359715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1359925Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1360304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1360590Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1360788Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1361065Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1361169Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1361285Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1361385Z xmask = xindex < xnumel 2025-12-04T10:35:21.1361501Z x0 = xindex 2025-12-04T10:35:21.1361609Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1361705Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1361775Z ^ 2025-12-04T10:35:21.1362155Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1362160Z 2025-12-04T10:35:21.1362766Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1362773Z 2025-12-04T10:35:21.1362777Z 2025-12-04T10:35:21.1362965Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1363640Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1363645Z 2025-12-04T10:35:21.1363875Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1364059Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1364145Z frames [('total', 1)] 2025-12-04T10:35:21.1364244Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1364641Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1364828Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1364915Z graph_break [] 2025-12-04T10:35:21.1365092Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1365177Z frames [('total', 1)] 2025-12-04T10:35:21.1365275Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1365459Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1365858Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1365940Z graph_break [] 2025-12-04T10:35:21.1366119Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1366209Z frames [('total', 1)] 2025-12-04T10:35:21.1366308Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1366489Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1366885Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1367016Z graph_break [] 2025-12-04T10:35:21.1367572Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.xml - 2025-12-04T10:35:21.1367764Z =========================== short test summary info ============================ 2025-12-04T10:35:21.1368421Z FAILED [0.3283s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1368694Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1368795Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1368911Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1369007Z xmask = xindex < xnumel 2025-12-04T10:35:21.1369078Z x0 = xindex 2025-12-04T10:35:21.1369177Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1369277Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1369350Z ^ 2025-12-04T10:35:21.1369682Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1369687Z 2025-12-04T10:35:21.1370298Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1370303Z 2025-12-04T10:35:21.1370306Z 2025-12-04T10:35:21.1370539Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1371212Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1371256Z 2025-12-04T10:35:21.1371481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1371639Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.1371811Z ================== 1 failed, 187 deselected, 2 rerun in 2.48s ================== 2025-12-04T10:35:21.1371893Z Got exit code 1 2025-12-04T10:35:21.1372363Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T10:35:21.1372713Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:21.1373121Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.xml 2025-12-04T10:35:21.1373257Z ============================= test session starts ============================== 2025-12-04T10:35:21.1373552Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.1373649Z cachedir: .pytest_cache 2025-12-04T10:35:21.1374093Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.1374199Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.1374284Z configfile: pytest.ini 2025-12-04T10:35:21.1374746Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.1374937Z collecting ... collected 188 items / 64 deselected / 124 selected 2025-12-04T10:35:21.1375052Z stepcurrent: skipping 64 already run items. 2025-12-04T10:35:21.1375145Z Running 124 items in this shard 2025-12-04T10:35:21.1375149Z 2025-12-04T10:35:21.1376156Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1376799Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1377311Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1377822Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1378244Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1378604Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1379122Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1379568Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1379992Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1380426Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1380849Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1381349Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1381814Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1382162Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1383707Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1384163Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1384898Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1385325Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1386030Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1386643Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1387364Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1387794Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1388506Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1389092Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1389863Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1390562Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1391281Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1391868Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1392586Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1393228Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1393984Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1394323Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1394896Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1395199Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1395649Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1396595Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1397125Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1397879Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1398453Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1399197Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1399853Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1400372Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1401012Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1401522Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1402042Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1402457Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1402813Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1403271Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1403714Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1404067Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1404762Z E1204 10:34:13.154000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1404869Z ('RERUN', {'yellow': True}) [1.7767s] [ 0%] 2025-12-04T10:35:21.1405865Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1406541Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1406998Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1407475Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1408135Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1408516Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1408971Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1409416Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1409840Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1410272Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1410695Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1411157Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1411617Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1411915Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1413542Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1414066Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1414798Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1415222Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1415921Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1416602Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1417395Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1417827Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1418595Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1419191Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1419935Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1420630Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1421345Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1421937Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1422653Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1423234Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1423992Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1424291Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1424865Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1425168Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1425670Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1426645Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1427180Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1427943Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1428519Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1429274Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1429965Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1430483Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1431170Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1431630Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1432108Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1432523Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1432882Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1433341Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1433783Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1434129Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1434824Z E1204 10:34:13.519000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1434933Z ('RERUN', {'yellow': True}) [0.3321s] [ 0%] 2025-12-04T10:35:21.1435883Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1436520Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1436984Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1437502Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1437985Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1438345Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1438801Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1439241Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1439666Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1440101Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1440530Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1440994Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1441494Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1441792Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1443370Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1443831Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1444567Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1444991Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1445695Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1446294Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1447014Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1447444Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1448155Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1448695Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1449474Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1450202Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1450921Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1451508Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1452220Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1452801Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1453596Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1453894Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1454505Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1454805Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1455256Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1456183Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1456726Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1457477Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1458054Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1458796Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1459568Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1460095Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1460736Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1461198Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1461724Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1462177Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1462538Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1462999Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1463437Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1463790Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1464486Z E1204 10:34:13.850000 94953 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1464570Z FAILED [0.3291s] [ 0%] 2025-12-04T10:35:21.1464575Z 2025-12-04T10:35:21.1464702Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.1464977Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1465118Z Traceback (most recent call last): 2025-12-04T10:35:21.1465426Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1465528Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1466011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1466245Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1466680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1466843Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1467274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1467395Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1467845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1468116Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1468564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1468684Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1469092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1469189Z return self._compile_to_module() 2025-12-04T10:35:21.1469598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1469736Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1470171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1470279Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1470702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1470900Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1471404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1471555Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1471995Z File "/tmp/tmpohe9brhe/jv/cjvmuecqquthpzzvzvd4jlivnlyxyg64sgspv7borqwcgjjmcwau.py", line 51, in 2025-12-04T10:35:21.1472433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1472520Z kernel.precompile( 2025-12-04T10:35:21.1472997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1473097Z self._precompile_worker() 2025-12-04T10:35:21.1473600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1473752Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1474254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1474419Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1474801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1475011Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1475387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1475792Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1475989Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1476298Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1476396Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1476514Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1476600Z xmask = xindex < xnumel 2025-12-04T10:35:21.1476675Z x0 = xindex 2025-12-04T10:35:21.1476774Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1476865Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1476934Z ^ 2025-12-04T10:35:21.1477272Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1477277Z 2025-12-04T10:35:21.1477891Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1477896Z 2025-12-04T10:35:21.1477900Z 2025-12-04T10:35:21.1478082Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1478770Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1478775Z 2025-12-04T10:35:21.1478998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1479190Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1479269Z frames [('total', 1)] 2025-12-04T10:35:21.1479364Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1479761Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1479948Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1480033Z graph_break [] 2025-12-04T10:35:21.1480307Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1480409Z Traceback (most recent call last): 2025-12-04T10:35:21.1480721Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1480821Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1481235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1481525Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1481998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1482160Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1482592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1482713Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1483167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1483439Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1483881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1484004Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1484410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1484513Z return self._compile_to_module() 2025-12-04T10:35:21.1484960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1485099Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1485534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1485681Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1486107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1486297Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1486797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1486901Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1487337Z File "/tmp/tmphekkkgde/vg/cvghy7lbglwjfqtrnzdaes42a2pji3o5xqbzwdatfrd3jf2mhnii.py", line 51, in 2025-12-04T10:35:21.1487733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1487822Z kernel.precompile( 2025-12-04T10:35:21.1488293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1488396Z self._precompile_worker() 2025-12-04T10:35:21.1488900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1489060Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1489562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1489729Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1490109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1490313Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1490687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1490970Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1491161Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1491430Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1491581Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1491696Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1491785Z xmask = xindex < xnumel 2025-12-04T10:35:21.1491898Z x0 = xindex 2025-12-04T10:35:21.1492002Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1492101Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1492173Z ^ 2025-12-04T10:35:21.1492511Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1492516Z 2025-12-04T10:35:21.1493122Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1493129Z 2025-12-04T10:35:21.1493132Z 2025-12-04T10:35:21.1493319Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1494012Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1494019Z 2025-12-04T10:35:21.1494243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1494425Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1494513Z frames [('total', 1)] 2025-12-04T10:35:21.1494649Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1495053Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1495279Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1495364Z graph_break [] 2025-12-04T10:35:21.1495542Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1495625Z frames [('total', 1)] 2025-12-04T10:35:21.1495719Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1495901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1496299Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1496380Z graph_break [] 2025-12-04T10:35:21.1496501Z =================================== FAILURES =================================== 2025-12-04T10:35:21.1496783Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1496883Z Traceback (most recent call last): 2025-12-04T10:35:21.1497198Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1497312Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1497722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1497926Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1498365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1498526Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1498961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1499128Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1499581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1499856Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1500297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1500418Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1500878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1500975Z return self._compile_to_module() 2025-12-04T10:35:21.1501429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1501564Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1502004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1502114Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1502534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1502727Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1503222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1503327Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1503764Z File "/tmp/tmpsujxd4mo/2h/c2hmwuahhhp3slo3a6xbqzk2k6tjrqeutk346ynmsnxwrrha7lyn.py", line 51, in 2025-12-04T10:35:21.1504155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1504246Z kernel.precompile( 2025-12-04T10:35:21.1504757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1504851Z self._precompile_worker() 2025-12-04T10:35:21.1505402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1505548Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1506055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1506224Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1506602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1506807Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1507183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1507468Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1507667Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1508102Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1508207Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1508319Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1508406Z xmask = xindex < xnumel 2025-12-04T10:35:21.1508481Z x0 = xindex 2025-12-04T10:35:21.1508579Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1508673Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1508749Z ^ 2025-12-04T10:35:21.1509075Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1509080Z 2025-12-04T10:35:21.1509693Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1509701Z 2025-12-04T10:35:21.1509705Z 2025-12-04T10:35:21.1509889Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1510573Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1510665Z 2025-12-04T10:35:21.1510893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1511070Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1511211Z frames [('total', 1)] 2025-12-04T10:35:21.1511304Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1511706Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1511892Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1511971Z graph_break [] 2025-12-04T10:35:21.1512148Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1512231Z frames [('total', 1)] 2025-12-04T10:35:21.1512320Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1512501Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1512893Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1512973Z graph_break [] 2025-12-04T10:35:21.1513153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1513232Z frames [('total', 1)] 2025-12-04T10:35:21.1513321Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1513567Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1513958Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1514041Z graph_break [] 2025-12-04T10:35:21.1514678Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.xml - 2025-12-04T10:35:21.1514818Z =========================== short test summary info ============================ 2025-12-04T10:35:21.1515493Z FAILED [0.3291s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1515765Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1515870Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1515983Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1516069Z xmask = xindex < xnumel 2025-12-04T10:35:21.1516143Z x0 = xindex 2025-12-04T10:35:21.1516239Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1521085Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1521185Z ^ 2025-12-04T10:35:21.1521529Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1521535Z 2025-12-04T10:35:21.1522145Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1522162Z 2025-12-04T10:35:21.1522166Z 2025-12-04T10:35:21.1522354Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1523052Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1523057Z 2025-12-04T10:35:21.1523292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1523447Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.1523623Z ================== 1 failed, 64 deselected, 2 rerun in 2.47s =================== 2025-12-04T10:35:21.1523710Z Got exit code 1 2025-12-04T10:35:21.1523803Z Retrying single test... 2025-12-04T10:35:21.1524212Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.xml 2025-12-04T10:35:21.1524422Z ============================= test session starts ============================== 2025-12-04T10:35:21.1524724Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.1524865Z cachedir: .pytest_cache 2025-12-04T10:35:21.1525322Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.1525440Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.1525537Z configfile: pytest.ini 2025-12-04T10:35:21.1526000Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.1526198Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.1526819Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1526922Z Running 1 items in this shard 2025-12-04T10:35:21.1526931Z 2025-12-04T10:35:21.1527889Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1528574Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1529042Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1529551Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1529970Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1530331Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1530794Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1531242Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1531668Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1532114Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1532538Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1532998Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1533460Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1533759Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1535310Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1535815Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1536599Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1537030Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1537748Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1538352Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1539133Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1539570Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1540325Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1540865Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1541636Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1542341Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1543053Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1543640Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1544358Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1544938Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1545703Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1546053Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1546629Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1546924Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1547376Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1548272Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1548886Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1549645Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1550216Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1550967Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1551623Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1552143Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1552829Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1553289Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1553810Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1554230Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1554593Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1555053Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1555497Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1555865Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1556597Z E1204 10:34:23.716000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1556705Z ('RERUN', {'yellow': True}) [1.7907s] [100%] 2025-12-04T10:35:21.1557660Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1558303Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1558763Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1559234Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1559665Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1560024Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1560523Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1561007Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1561434Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1561866Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1562290Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1562747Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1563216Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1563518Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1565126Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1565619Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1566405Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1566836Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1567540Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1568144Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1568863Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1569295Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1570012Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1570554Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1571289Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1571987Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1572784Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1573374Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1574088Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1574668Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1575430Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1575737Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1576360Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1576659Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1577106Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1578038Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1578571Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1579407Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1579980Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1580726Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1581384Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1581906Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1582556Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1583016Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1583492Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1583906Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1584315Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1584820Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1585265Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1585628Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1586324Z E1204 10:34:24.081000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1586437Z ('RERUN', {'yellow': True}) [0.3317s] [100%] 2025-12-04T10:35:21.1587396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1588039Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1588544Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1589018Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1589476Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1589838Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1590293Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1590741Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1591162Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1591596Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1592023Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1592482Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1592944Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1593243Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1594791Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1595252Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1596037Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1596548Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1597252Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1597858Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1598583Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1599013Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1599729Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1600311Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1601048Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1601778Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1602503Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1603094Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1603815Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1604394Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1605153Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1605458Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1606060Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1606391Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1606838Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1607726Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1608424Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1609338Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1609915Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1610658Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1611329Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1611851Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1612494Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1613016Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1613498Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1613967Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1614326Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1614789Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1615229Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1615581Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1616333Z E1204 10:34:24.414000 95134 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1616420Z FAILED [0.3315s] [100%] 2025-12-04T10:35:21.1616424Z 2025-12-04T10:35:21.1616547Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.1616823Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1616936Z Traceback (most recent call last): 2025-12-04T10:35:21.1617249Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1617353Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1617776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1617991Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1618428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1618596Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1619076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1619205Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1619663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1619990Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1620487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1620610Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1621022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1621124Z return self._compile_to_module() 2025-12-04T10:35:21.1621535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1621675Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1622113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1622229Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1622646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1622849Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1623543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1623695Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1624249Z File "/tmp/tmpjib6zihx/vn/cvncu4cmth7bwqhqbuxaqrwjz5bymnvrdyzawn76e3doqn5z6lf3.py", line 51, in 2025-12-04T10:35:21.1624747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1624841Z kernel.precompile( 2025-12-04T10:35:21.1625324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1625430Z self._precompile_worker() 2025-12-04T10:35:21.1625987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1626140Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1626646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1626821Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1627208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1627419Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1627798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1628082Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1628277Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1628551Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1628654Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1628773Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1628863Z xmask = xindex < xnumel 2025-12-04T10:35:21.1628941Z x0 = xindex 2025-12-04T10:35:21.1629048Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1629142Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1629219Z ^ 2025-12-04T10:35:21.1629550Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1629556Z 2025-12-04T10:35:21.1630164Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1630221Z 2025-12-04T10:35:21.1630225Z 2025-12-04T10:35:21.1630415Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1631145Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1631153Z 2025-12-04T10:35:21.1631382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1631565Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1631655Z frames [('total', 1)] 2025-12-04T10:35:21.1631751Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1632155Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1632341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1632430Z graph_break [] 2025-12-04T10:35:21.1632705Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1632808Z Traceback (most recent call last): 2025-12-04T10:35:21.1633127Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1633229Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1633691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1633900Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1634375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1634541Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1634971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1635101Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1635554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1635850Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1636325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1636446Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1636856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1636959Z return self._compile_to_module() 2025-12-04T10:35:21.1637467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1637614Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1638051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1638163Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1638594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1638788Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1639293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1639399Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1639835Z File "/tmp/tmp1piy4oy2/yj/cyjntxgpam3ussp25tu34tk337frgpermhbqgb4umn3fipyxqin3.py", line 51, in 2025-12-04T10:35:21.1640231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1640367Z kernel.precompile( 2025-12-04T10:35:21.1640886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1640986Z self._precompile_worker() 2025-12-04T10:35:21.1641492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1641645Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1642148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1642317Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1642699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1642908Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1643292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1643576Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1643772Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1644092Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1644195Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1644311Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1644451Z xmask = xindex < xnumel 2025-12-04T10:35:21.1644528Z x0 = xindex 2025-12-04T10:35:21.1644634Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1644732Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1644802Z ^ 2025-12-04T10:35:21.1645135Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1645140Z 2025-12-04T10:35:21.1645752Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1645757Z 2025-12-04T10:35:21.1645761Z 2025-12-04T10:35:21.1645952Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1646641Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1646649Z 2025-12-04T10:35:21.1646871Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1647058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1647143Z frames [('total', 1)] 2025-12-04T10:35:21.1647247Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1647652Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1647840Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1647923Z graph_break [] 2025-12-04T10:35:21.1648101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1648186Z frames [('total', 1)] 2025-12-04T10:35:21.1648283Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1648463Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1648862Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1648951Z graph_break [] 2025-12-04T10:35:21.1649074Z =================================== FAILURES =================================== 2025-12-04T10:35:21.1649350Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1649525Z Traceback (most recent call last): 2025-12-04T10:35:21.1649841Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1650076Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1650495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1650711Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1651163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1651327Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1651767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1651887Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1652345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1652625Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1653069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1653244Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1653655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1653800Z return self._compile_to_module() 2025-12-04T10:35:21.1654217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1654355Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1654800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1654916Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1655340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1655540Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1656089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1656193Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1656647Z File "/tmp/tmpm20ydfom/jx/cjxc5kdxyvsf2tfwnsber2e3zuomavoytzyrnjwv6gqktkcc32iz.py", line 51, in 2025-12-04T10:35:21.1657041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1657135Z kernel.precompile( 2025-12-04T10:35:21.1657610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1657704Z self._precompile_worker() 2025-12-04T10:35:21.1658216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1658368Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1658878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1659102Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1659487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1659697Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1660067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1660399Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1660643Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1660912Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1661023Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1661136Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1661227Z xmask = xindex < xnumel 2025-12-04T10:35:21.1661308Z x0 = xindex 2025-12-04T10:35:21.1661409Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1661507Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1661593Z ^ 2025-12-04T10:35:21.1661923Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1661928Z 2025-12-04T10:35:21.1662547Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1662551Z 2025-12-04T10:35:21.1662555Z 2025-12-04T10:35:21.1662739Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1663479Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1663490Z 2025-12-04T10:35:21.1663717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1663939Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1664031Z frames [('total', 1)] 2025-12-04T10:35:21.1664130Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1664528Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1664718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1664796Z graph_break [] 2025-12-04T10:35:21.1664981Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1665061Z frames [('total', 1)] 2025-12-04T10:35:21.1665152Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1665339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1665734Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1665814Z graph_break [] 2025-12-04T10:35:21.1665997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1666078Z frames [('total', 1)] 2025-12-04T10:35:21.1666171Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1666358Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1666750Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1666832Z graph_break [] 2025-12-04T10:35:21.1667387Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.xml - 2025-12-04T10:35:21.1667528Z =========================== short test summary info ============================ 2025-12-04T10:35:21.1668205Z FAILED [0.3315s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1668478Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1668578Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1668689Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1668779Z xmask = xindex < xnumel 2025-12-04T10:35:21.1668906Z x0 = xindex 2025-12-04T10:35:21.1669008Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1669100Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1669175Z ^ 2025-12-04T10:35:21.1669544Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1669549Z 2025-12-04T10:35:21.1670169Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1670174Z 2025-12-04T10:35:21.1670178Z 2025-12-04T10:35:21.1670357Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1671042Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1671046Z 2025-12-04T10:35:21.1671276Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1671425Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.1671594Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ================== 2025-12-04T10:35:21.1671671Z Got exit code 1 2025-12-04T10:35:21.1671758Z Retrying single test... 2025-12-04T10:35:21.1672205Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.xml 2025-12-04T10:35:21.1672336Z ============================= test session starts ============================== 2025-12-04T10:35:21.1672628Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.1672761Z cachedir: .pytest_cache 2025-12-04T10:35:21.1673204Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.1673307Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.1673398Z configfile: pytest.ini 2025-12-04T10:35:21.1673856Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.1674046Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.1674658Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1674753Z Running 1 items in this shard 2025-12-04T10:35:21.1674758Z 2025-12-04T10:35:21.1675709Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1676401Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1676871Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1677339Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1677757Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1678114Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1678571Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1679017Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1679490Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1679956Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1680379Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1680843Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1681302Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1681597Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1683182Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1683633Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1684430Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1684852Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1685563Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1686163Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1686882Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1687310Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1688021Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1688563Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1689295Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1689988Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1690699Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1691286Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1692097Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1692678Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1693430Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1693728Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1694302Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1694600Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1695048Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1695973Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1696542Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1697291Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1697867Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1698615Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1699323Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1699848Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1700488Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1700947Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1701419Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1701832Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1702194Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1702650Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1703089Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1703491Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1704228Z E1204 10:34:34.308000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1704344Z ('RERUN', {'yellow': True}) [1.7981s] [100%] 2025-12-04T10:35:21.1705292Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1705930Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1706393Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1706863Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1707319Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1707677Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1708289Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1708814Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1709236Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1709668Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1710092Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1710555Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1711012Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1711310Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1712851Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1713307Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1714038Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1714462Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1715171Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1715919Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1716664Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1717089Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1717803Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1718346Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1719079Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1719829Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1720542Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1721171Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1721890Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1722468Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1723221Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1723520Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1724096Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1724394Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1724842Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1725730Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1726262Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1727020Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1727642Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1728436Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1729094Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1729613Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1730256Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1730712Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1731188Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1731667Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1732027Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1732487Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1732967Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1733313Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1734013Z E1204 10:34:34.672000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1734122Z ('RERUN', {'yellow': True}) [0.3298s] [100%] 2025-12-04T10:35:21.1735069Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1735706Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1736213Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1736684Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1737100Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1737458Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1737915Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1738362Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1738784Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T10:35:21.1739307Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T10:35:21.1739770Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T10:35:21.1740229Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T10:35:21.1740691Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T10:35:21.1740989Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1742534Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1742985Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1743754Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1744213Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1744916Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T10:35:21.1745516Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T10:35:21.1746235Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T10:35:21.1746662Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T10:35:21.1747371Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T10:35:21.1747913Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T10:35:21.1748648Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T10:35:21.1749422Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T10:35:21.1750133Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T10:35:21.1750721Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T10:35:21.1751433Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T10:35:21.1752099Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T10:35:21.1752852Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1753149Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1753725Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T10:35:21.1754020Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1754469Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1755355Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1755927Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1756680Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1757290Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1758040Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1758693Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1759209Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T10:35:21.1759857Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1760313Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1760790Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1761209Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T10:35:21.1761569Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T10:35:21.1762025Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1762466Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1762812Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1763548Z E1204 10:34:35.002000 95315 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1763674Z FAILED [0.3281s] [100%] 2025-12-04T10:35:21.1763682Z 2025-12-04T10:35:21.1763800Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.1764077Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1764185Z Traceback (most recent call last): 2025-12-04T10:35:21.1764492Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1764597Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1765013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1765219Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1765660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1765819Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1766251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1766418Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1766868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1767136Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1767630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1767753Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1768161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1768258Z return self._compile_to_module() 2025-12-04T10:35:21.1768668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1768809Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1769248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1769357Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1769774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1769970Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1770469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1770575Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1771010Z File "/tmp/tmpe3k2j0xr/rx/crxsbidmuj75eeq4quiyndijtubzbohtghod3g2vcmjhcaiv5e3i.py", line 51, in 2025-12-04T10:35:21.1771406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1771495Z kernel.precompile( 2025-12-04T10:35:21.1771967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1772059Z self._precompile_worker() 2025-12-04T10:35:21.1772568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1772717Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1773219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1773437Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1773878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1774084Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1774459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1774740Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1774938Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1775203Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1775299Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1775413Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1775502Z xmask = xindex < xnumel 2025-12-04T10:35:21.1775578Z x0 = xindex 2025-12-04T10:35:21.1775678Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1775772Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1775847Z ^ 2025-12-04T10:35:21.1776177Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1776182Z 2025-12-04T10:35:21.1776837Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1776843Z 2025-12-04T10:35:21.1776885Z 2025-12-04T10:35:21.1777069Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1777753Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1777760Z 2025-12-04T10:35:21.1777984Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1778163Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1778247Z frames [('total', 1)] 2025-12-04T10:35:21.1778343Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1778743Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1778925Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1779011Z graph_break [] 2025-12-04T10:35:21.1779335Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1779439Z Traceback (most recent call last): 2025-12-04T10:35:21.1779753Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1779852Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1780267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1780471Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1780907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1781073Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1781502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1781622Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1782074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1782341Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1782833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1782952Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1783402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1783507Z return self._compile_to_module() 2025-12-04T10:35:21.1783919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1784059Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1784496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1784605Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1785023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1785221Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1785723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1785825Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1786282Z File "/tmp/tmpiq1_n3h8/ye/cyelgf577euwxblutrz6ac4dsqsrig56dzg54ilfdqpuceroxu5d.py", line 51, in 2025-12-04T10:35:21.1786675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1786762Z kernel.precompile( 2025-12-04T10:35:21.1787272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1787366Z self._precompile_worker() 2025-12-04T10:35:21.1787874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1788025Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1788531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1788699Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1789081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1789281Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1789654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1789935Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1790132Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1790402Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1790504Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1790618Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1790707Z xmask = xindex < xnumel 2025-12-04T10:35:21.1790779Z x0 = xindex 2025-12-04T10:35:21.1790877Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1790973Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1791047Z ^ 2025-12-04T10:35:21.1791378Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1791383Z 2025-12-04T10:35:21.1791990Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1791995Z 2025-12-04T10:35:21.1792000Z 2025-12-04T10:35:21.1792183Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1792912Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1792918Z 2025-12-04T10:35:21.1793183Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1793366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1793451Z frames [('total', 1)] 2025-12-04T10:35:21.1793547Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1793944Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1794130Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1794212Z graph_break [] 2025-12-04T10:35:21.1794386Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1794470Z frames [('total', 1)] 2025-12-04T10:35:21.1794567Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1794747Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1795143Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1795221Z graph_break [] 2025-12-04T10:35:21.1795379Z =================================== FAILURES =================================== 2025-12-04T10:35:21.1795656Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T10:35:21.1795754Z Traceback (most recent call last): 2025-12-04T10:35:21.1796159Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T10:35:21.1796264Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T10:35:21.1796675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1796889Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1797321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1797483Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1797921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1798042Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1798494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1798771Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1799209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1799332Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1799736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1799834Z return self._compile_to_module() 2025-12-04T10:35:21.1800252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1800387Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1800827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1800932Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1801351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1801548Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1802044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1802197Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1802670Z File "/tmp/tmpakkjjst6/i2/ci2x3fb3ijnofyls2b5v3dzbvgt6g7uziphmuwdohkhqj2y6hre3.py", line 51, in 2025-12-04T10:35:21.1803064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1803159Z kernel.precompile( 2025-12-04T10:35:21.1803628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1803726Z self._precompile_worker() 2025-12-04T10:35:21.1804234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1804379Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1804885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1805048Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1805425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1805678Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1806101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1806382Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1806617Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1806881Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1806985Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1807099Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1807185Z xmask = xindex < xnumel 2025-12-04T10:35:21.1807263Z x0 = xindex 2025-12-04T10:35:21.1807364Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1807459Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1807536Z ^ 2025-12-04T10:35:21.1808169Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1808176Z 2025-12-04T10:35:21.1808785Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1808793Z 2025-12-04T10:35:21.1808797Z 2025-12-04T10:35:21.1808974Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1809666Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1813363Z 2025-12-04T10:35:21.1813611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1813797Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1813888Z frames [('total', 1)] 2025-12-04T10:35:21.1813984Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1814385Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1814578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1814659Z graph_break [] 2025-12-04T10:35:21.1814842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1814928Z frames [('total', 1)] 2025-12-04T10:35:21.1815022Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1815212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1815733Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1815813Z graph_break [] 2025-12-04T10:35:21.1816060Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1816147Z frames [('total', 1)] 2025-12-04T10:35:21.1816239Z stats [('calls_captured', 4)] 2025-12-04T10:35:21.1816432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1816822Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1816917Z graph_break [] 2025-12-04T10:35:21.1817480Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.xml - 2025-12-04T10:35:21.1817624Z =========================== short test summary info ============================ 2025-12-04T10:35:21.1818311Z FAILED [0.3281s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T10:35:21.1818585Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1818696Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1818870Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1818959Z xmask = xindex < xnumel 2025-12-04T10:35:21.1819100Z x0 = xindex 2025-12-04T10:35:21.1819202Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T10:35:21.1819361Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T10:35:21.1819442Z ^ 2025-12-04T10:35:21.1819776Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T10:35:21.1819781Z 2025-12-04T10:35:21.1820395Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1820403Z 2025-12-04T10:35:21.1820406Z 2025-12-04T10:35:21.1820590Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1821279Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1821291Z 2025-12-04T10:35:21.1821516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1821666Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.1821842Z ================== 1 failed, 187 deselected, 2 rerun in 2.49s ================== 2025-12-04T10:35:21.1821928Z Got exit code 1 2025-12-04T10:35:21.1822402Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T10:35:21.1822760Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:21.1823160Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.xml 2025-12-04T10:35:21.1823301Z ============================= test session starts ============================== 2025-12-04T10:35:21.1823596Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.1823688Z cachedir: .pytest_cache 2025-12-04T10:35:21.1824139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.1824243Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.1824330Z configfile: pytest.ini 2025-12-04T10:35:21.1824793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.1825111Z collecting ... collected 188 items / 65 deselected / 123 selected 2025-12-04T10:35:21.1825236Z stepcurrent: skipping 65 already run items. 2025-12-04T10:35:21.1825331Z Running 123 items in this shard 2025-12-04T10:35:21.1825336Z 2025-12-04T10:35:21.1826279Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1826894Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1827253Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.1827712Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1828188Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1828663Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.1829152Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.1829615Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.1830104Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.1830715Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.1831028Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1832521Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1832982Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1833869Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1834406Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1835167Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1835748Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1836550Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1837203Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1837806Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.1838419Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1838721Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1839488Z E1204 10:34:44.436000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1839599Z ('RERUN', {'yellow': True}) [1.3200s] [ 0%] 2025-12-04T10:35:21.1840493Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1841105Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1841506Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.1841970Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1842485Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1842963Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.1843406Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.1843871Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.1844311Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.1844920Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.1845233Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1846769Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1847231Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1848114Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1848653Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1849449Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1850056Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1850810Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1851468Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1851993Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.1852604Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1852915Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1853713Z E1204 10:34:44.682000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1853824Z ('RERUN', {'yellow': True}) [0.2131s] [ 0%] 2025-12-04T10:35:21.1854780Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1855387Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1855756Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.1856211Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1856694Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1857169Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.1857610Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.1858077Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.1858516Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.1859192Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.1859494Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1860975Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1861530Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1862417Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1862953Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1863708Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1864296Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1865047Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1865757Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1866273Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.1866926Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1867239Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1868003Z E1204 10:34:44.894000 95496 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1868097Z FAILED [0.2105s] [ 0%] 2025-12-04T10:35:21.1868101Z 2025-12-04T10:35:21.1868223Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.1868467Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.1868577Z Traceback (most recent call last): 2025-12-04T10:35:21.1868947Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.1869054Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.1869466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1869676Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1870121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1870284Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1870719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1870848Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1871300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1871580Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1872020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1872200Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1872613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1872755Z return self._compile_to_module() 2025-12-04T10:35:21.1873171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1873309Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1873749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1873867Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1874283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1874476Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1874988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1875093Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1875536Z File "/tmp/tmpgol57jy2/w3/cw3nv2awzzn4lic4te2mw6rkulsdb7isc3oaearntin6mzockj6z.py", line 45, in 2025-12-04T10:35:21.1875996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1876099Z kernel.precompile( 2025-12-04T10:35:21.1876590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1876729Z self._precompile_worker() 2025-12-04T10:35:21.1877243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1877395Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1877905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1878080Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1878459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1878670Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1879042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1879327Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1879523Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.1879759Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1879834Z ^ 2025-12-04T10:35:21.1880234Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1880239Z 2025-12-04T10:35:21.1880848Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1880853Z 2025-12-04T10:35:21.1880857Z 2025-12-04T10:35:21.1881044Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1881656Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.1881664Z 2025-12-04T10:35:21.1881892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1882070Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1882162Z frames [('total', 1)] 2025-12-04T10:35:21.1882311Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1882514Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1882741Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1882827Z graph_break [] 2025-12-04T10:35:21.1883069Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.1883180Z Traceback (most recent call last): 2025-12-04T10:35:21.1883552Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.1883649Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.1884072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1884284Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1884718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1884890Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1885322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1885453Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1885949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1886263Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1886762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1886883Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1887306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1887410Z return self._compile_to_module() 2025-12-04T10:35:21.1887816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1887965Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1888404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1888515Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1888946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1889147Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1889648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1889755Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1890195Z File "/tmp/tmpbbu2cx9k/vo/cvo6hyaq4qnyygkgomzus5xpmcnywbqcr3zph3u3yc3xx4d7y4vt.py", line 45, in 2025-12-04T10:35:21.1890601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1890693Z kernel.precompile( 2025-12-04T10:35:21.1891186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1891287Z self._precompile_worker() 2025-12-04T10:35:21.1891791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1891949Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1892457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1892622Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1893052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1893292Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1893675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1893957Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1894148Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.1894389Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1894462Z ^ 2025-12-04T10:35:21.1894850Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1894867Z 2025-12-04T10:35:21.1895479Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1895486Z 2025-12-04T10:35:21.1895490Z 2025-12-04T10:35:21.1895680Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1896422Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.1896427Z 2025-12-04T10:35:21.1896654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1896839Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1896966Z frames [('total', 1)] 2025-12-04T10:35:21.1897058Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1897272Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1897458Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1897537Z graph_break [] 2025-12-04T10:35:21.1897724Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1897807Z frames [('total', 1)] 2025-12-04T10:35:21.1897907Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1898090Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1898294Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1898382Z graph_break [] 2025-12-04T10:35:21.1898499Z =================================== FAILURES =================================== 2025-12-04T10:35:21.1898739Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.1898848Z Traceback (most recent call last): 2025-12-04T10:35:21.1899292Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.1899396Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.1899811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1900017Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1900465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1900633Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1901067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1901192Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1901645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1901920Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1902407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1902528Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1902983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1903082Z return self._compile_to_module() 2025-12-04T10:35:21.1903499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1903635Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1904073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1904184Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1904600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1904795Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1905306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1905411Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1905873Z File "/tmp/tmpprd8y_q3/7q/c7qgjtpfsa6thos3owiuex6xlfrxs4bj7vygfeco5elq246qfywl.py", line 45, in 2025-12-04T10:35:21.1906264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1906353Z kernel.precompile( 2025-12-04T10:35:21.1906866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1906960Z self._precompile_worker() 2025-12-04T10:35:21.1907471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1907620Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1908358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1908591Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1909100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1909368Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1909841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1910130Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1910333Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.1910570Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1910639Z ^ 2025-12-04T10:35:21.1911033Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1911040Z 2025-12-04T10:35:21.1911647Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1911652Z 2025-12-04T10:35:21.1911656Z 2025-12-04T10:35:21.1911842Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1912454Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.1912461Z 2025-12-04T10:35:21.1912703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1912881Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1913061Z frames [('total', 1)] 2025-12-04T10:35:21.1913163Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1913421Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1913608Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1913690Z graph_break [] 2025-12-04T10:35:21.1913869Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1913953Z frames [('total', 1)] 2025-12-04T10:35:21.1914053Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1914234Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1914439Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1914518Z graph_break [] 2025-12-04T10:35:21.1914696Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1914787Z frames [('total', 1)] 2025-12-04T10:35:21.1914879Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1915062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1915264Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1915342Z graph_break [] 2025-12-04T10:35:21.1916007Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.xml - 2025-12-04T10:35:21.1916148Z =========================== short test summary info ============================ 2025-12-04T10:35:21.1916761Z FAILED [0.2105s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.1917078Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1917148Z ^ 2025-12-04T10:35:21.1917534Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1917548Z 2025-12-04T10:35:21.1918156Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1918160Z 2025-12-04T10:35:21.1918164Z 2025-12-04T10:35:21.1918348Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1918971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.1918975Z 2025-12-04T10:35:21.1919202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1919364Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.1919529Z ================== 1 failed, 65 deselected, 2 rerun in 1.78s =================== 2025-12-04T10:35:21.1919614Z Got exit code 1 2025-12-04T10:35:21.1919707Z Retrying single test... 2025-12-04T10:35:21.1920109Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.xml 2025-12-04T10:35:21.1920243Z ============================= test session starts ============================== 2025-12-04T10:35:21.1920543Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.1920633Z cachedir: .pytest_cache 2025-12-04T10:35:21.1921088Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.1921194Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.1921282Z configfile: pytest.ini 2025-12-04T10:35:21.1921747Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.1921938Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.1922534Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.1922721Z Running 1 items in this shard 2025-12-04T10:35:21.1923273Z 2025-12-04T10:35:21.1924178Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1924796Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1925156Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.1925620Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1926094Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1926568Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.1927060Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.1927525Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.1928011Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.1928622Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.1928925Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1930420Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1930878Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1931768Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1932305Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1933072Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1933650Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1934406Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1935107Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1935692Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.1936311Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1936619Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1937384Z E1204 10:34:54.921000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1937494Z ('RERUN', {'yellow': True}) [1.3255s] [100%] 2025-12-04T10:35:21.1938390Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1939128Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1939489Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.1939950Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1940468Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1940944Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.1941385Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.1941845Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.1942299Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.1942913Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.1943221Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1944700Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1945163Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1946051Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1946587Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1947429Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1948011Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1948761Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1949422Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1949944Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.1950559Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1950864Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1951661Z E1204 10:34:55.165000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1951811Z ('RERUN', {'yellow': True}) [0.2116s] [100%] 2025-12-04T10:35:21.1952693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.1953300Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1953663Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.1954126Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.1954594Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.1955075Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.1955514Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.1956006Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.1956480Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.1957091Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.1957389Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.1958865Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.1959402Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.1960290Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1960822Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1961577Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1962158Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1962906Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1963601Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1964154Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.1964765Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1965075Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.1965836Z E1204 10:34:55.377000 95677 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1965921Z FAILED [0.2096s] [100%] 2025-12-04T10:35:21.1965925Z 2025-12-04T10:35:21.1966053Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.1966340Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.1966445Z Traceback (most recent call last): 2025-12-04T10:35:21.1966811Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.1966908Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.1967321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1967535Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1967976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1968139Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1968571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1968694Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1969148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1969431Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1969868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1970037Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1970489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1970587Z return self._compile_to_module() 2025-12-04T10:35:21.1970996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1971132Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1971576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1971685Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1972104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1972296Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1972798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1972904Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1973335Z File "/tmp/tmpf546t5lc/gz/cgzmfmc5lfvvjm43n3swwe5bcztxho6q3brd2phy2wzl6xfzx5jb.py", line 45, in 2025-12-04T10:35:21.1973773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1973861Z kernel.precompile( 2025-12-04T10:35:21.1974335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1974472Z self._precompile_worker() 2025-12-04T10:35:21.1974976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1975127Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1975629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1975796Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1976173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1976378Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1976760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1977042Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1977233Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.1977465Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1977534Z ^ 2025-12-04T10:35:21.1977922Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1977927Z 2025-12-04T10:35:21.1978531Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1978536Z 2025-12-04T10:35:21.1978544Z 2025-12-04T10:35:21.1978729Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1979426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.1979434Z 2025-12-04T10:35:21.1979656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1979838Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1979989Z frames [('total', 1)] 2025-12-04T10:35:21.1980088Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1980285Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1980508Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1980590Z graph_break [] 2025-12-04T10:35:21.1980830Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.1980931Z Traceback (most recent call last): 2025-12-04T10:35:21.1981299Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.1981394Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.1981809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1982015Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1982451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1982615Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1983048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1983165Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1983667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1983938Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.1984418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.1984537Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.1984941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.1985043Z return self._compile_to_module() 2025-12-04T10:35:21.1985452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.1985587Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.1986075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.1986184Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.1986603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.1986796Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.1987295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.1987400Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.1987831Z File "/tmp/tmps4gi5qmf/ac/cac4hnrw52ut4pur633pmxiwbw6zo36sgiefykofqi45pif2xrtj.py", line 45, in 2025-12-04T10:35:21.1988226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.1988313Z kernel.precompile( 2025-12-04T10:35:21.1988783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.1988883Z self._precompile_worker() 2025-12-04T10:35:21.1989386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.1989536Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.1990038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.1990247Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.1990628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.1990867Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.1991248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.1991528Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.1991719Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.1991963Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.1992035Z ^ 2025-12-04T10:35:21.1992419Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.1992424Z 2025-12-04T10:35:21.1993039Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.1993043Z 2025-12-04T10:35:21.1993047Z 2025-12-04T10:35:21.1993228Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.1993881Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.1993887Z 2025-12-04T10:35:21.1994110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.1994417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1994501Z frames [('total', 1)] 2025-12-04T10:35:21.1994594Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1994800Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1995000Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1995082Z graph_break [] 2025-12-04T10:35:21.1995275Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.1995362Z frames [('total', 1)] 2025-12-04T10:35:21.1995457Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.1995653Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.1995861Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.1995945Z graph_break [] 2025-12-04T10:35:21.1996071Z =================================== FAILURES =================================== 2025-12-04T10:35:21.1996330Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.1996441Z Traceback (most recent call last): 2025-12-04T10:35:21.1996827Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.1996928Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.1997368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.1997588Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.1998061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.1998230Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.1998690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.1998825Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.1999306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.1999597Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.2000112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.2000239Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.2000718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.2000822Z return self._compile_to_module() 2025-12-04T10:35:21.2001262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.2001413Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.2001881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.2001996Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.2002442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.2002648Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.2003183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.2003294Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.2003804Z File "/tmp/tmpgu6qsbdx/hs/chszo5w7z5wf3c6rogt4jfxjsd4uvjjy3reehjdpvyf3mli2ider.py", line 45, in 2025-12-04T10:35:21.2004196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.2004324Z kernel.precompile( 2025-12-04T10:35:21.2004801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.2004895Z self._precompile_worker() 2025-12-04T10:35:21.2005403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.2005553Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.2006105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.2006272Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.2006650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.2006850Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.2007235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.2007514Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.2007705Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.2008186Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2008255Z ^ 2025-12-04T10:35:21.2008649Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2008654Z 2025-12-04T10:35:21.2009262Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.2009267Z 2025-12-04T10:35:21.2009271Z 2025-12-04T10:35:21.2009452Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.2010063Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.2010068Z 2025-12-04T10:35:21.2010294Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.2010550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2010633Z frames [('total', 1)] 2025-12-04T10:35:21.2010730Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2010987Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2011172Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2011256Z graph_break [] 2025-12-04T10:35:21.2011432Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2011516Z frames [('total', 1)] 2025-12-04T10:35:21.2011608Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2011790Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2011990Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2012076Z graph_break [] 2025-12-04T10:35:21.2012251Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2012339Z frames [('total', 1)] 2025-12-04T10:35:21.2012430Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2012607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2012810Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2012885Z graph_break [] 2025-12-04T10:35:21.2013499Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.xml - 2025-12-04T10:35:21.2013648Z =========================== short test summary info ============================ 2025-12-04T10:35:21.2014253Z FAILED [0.2096s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.2014572Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2014641Z ^ 2025-12-04T10:35:21.2015027Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2015034Z 2025-12-04T10:35:21.2015641Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.2015646Z 2025-12-04T10:35:21.2015649Z 2025-12-04T10:35:21.2015835Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.2016495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.2016503Z 2025-12-04T10:35:21.2016724Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.2016873Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.2017043Z ================== 1 failed, 187 deselected, 2 rerun in 1.78s ================== 2025-12-04T10:35:21.2017126Z Got exit code 1 2025-12-04T10:35:21.2017219Z Retrying single test... 2025-12-04T10:35:21.2017692Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.xml 2025-12-04T10:35:21.2017827Z ============================= test session starts ============================== 2025-12-04T10:35:21.2018127Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.2018213Z cachedir: .pytest_cache 2025-12-04T10:35:21.2018659Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.2018761Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.2018847Z configfile: pytest.ini 2025-12-04T10:35:21.2019410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.2019647Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T10:35:21.2020187Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.2020322Z Running 1 items in this shard 2025-12-04T10:35:21.2020327Z 2025-12-04T10:35:21.2021222Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.2021835Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2022200Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.2022659Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.2023134Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.2023608Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.2024088Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.2024549Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.2025033Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.2025638Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.2025964Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.2027485Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.2027936Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.2028823Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.2029356Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.2030113Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.2030683Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.2031435Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.2032125Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.2032677Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.2033288Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2033590Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.2034352Z E1204 10:35:05.471000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2034460Z ('RERUN', {'yellow': True}) [1.3388s] [100%] 2025-12-04T10:35:21.2035346Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.2036033Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2036401Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.2036897Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.2037369Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.2037848Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.2038284Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.2038743Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.2039182Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.2039786Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.2040087Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.2041572Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.2042029Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.2042912Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.2043442Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.2044307Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.2044882Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.2045633Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.2046286Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.2046804Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.2047422Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2047725Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.2048531Z E1204 10:35:05.715000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2048679Z ('RERUN', {'yellow': True}) [0.2112s] [100%] 2025-12-04T10:35:21.2049561Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T10:35:21.2050168Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2050527Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T10:35:21.2050984Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T10:35:21.2051452Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T10:35:21.2051933Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T10:35:21.2052370Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T10:35:21.2052834Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T10:35:21.2053274Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T10:35:21.2053883Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T10:35:21.2054183Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T10:35:21.2055661Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 86} 2025-12-04T10:35:21.2056252Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T10:35:21.2057135Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.2057671Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.2058426Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.2059010Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.2059813Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.2060533Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.2061088Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T10:35:21.2061692Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2062001Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T10:35:21.2062757Z E1204 10:35:05.925000 95858 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2062838Z FAILED [0.2084s] [100%] 2025-12-04T10:35:21.2062849Z 2025-12-04T10:35:21.2062969Z ==================================== RERUNS ==================================== 2025-12-04T10:35:21.2063213Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.2063321Z Traceback (most recent call last): 2025-12-04T10:35:21.2063685Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.2063778Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.2064193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.2064408Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.2064852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.2065016Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.2065447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.2065570Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.2066063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.2066353Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.2066790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.2066960Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.2067410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.2067507Z return self._compile_to_module() 2025-12-04T10:35:21.2067915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.2068054Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.2068489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.2068599Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.2069013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.2069207Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.2069710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.2069815Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.2070257Z File "/tmp/tmpfh6xc2uv/h7/ch7xssf6cphasno2hppyyj53qk7amp5apj44vgbcnbvtyemftxuu.py", line 45, in 2025-12-04T10:35:21.2070688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.2070777Z kernel.precompile( 2025-12-04T10:35:21.2071251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.2071387Z self._precompile_worker() 2025-12-04T10:35:21.2071890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.2072042Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.2072548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.2072717Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.2073098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.2073298Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.2073672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.2073955Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.2074150Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.2074381Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2074451Z ^ 2025-12-04T10:35:21.2074842Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2074847Z 2025-12-04T10:35:21.2075452Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.2075459Z 2025-12-04T10:35:21.2075463Z 2025-12-04T10:35:21.2075649Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.2076259Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.2076266Z 2025-12-04T10:35:21.2076486Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.2076664Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2076792Z frames [('total', 1)] 2025-12-04T10:35:21.2076887Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2077084Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2077307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2077396Z graph_break [] 2025-12-04T10:35:21.2077637Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.2077734Z Traceback (most recent call last): 2025-12-04T10:35:21.2078102Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.2078204Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.2078615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.2078825Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.2079260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.2079421Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.2079852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.2079972Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.2080465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.2080735Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.2081218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.2081335Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.2085394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.2085524Z return self._compile_to_module() 2025-12-04T10:35:21.2085959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.2086118Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.2086598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.2086709Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.2087134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.2087331Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.2087829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.2087944Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.2088365Z File "/tmp/tmpv_xfm0hy/cj/ccjvvismqyeuh7hlgw7eqkeh2ngfoksrt4m6l6ndzvgiwwykpz3g.py", line 45, in 2025-12-04T10:35:21.2088763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.2088853Z kernel.precompile( 2025-12-04T10:35:21.2089332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.2089437Z self._precompile_worker() 2025-12-04T10:35:21.2089948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.2090097Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.2090615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.2090845Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.2091234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.2091487Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.2091865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.2092156Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.2092347Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.2092594Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2092663Z ^ 2025-12-04T10:35:21.2093055Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2093063Z 2025-12-04T10:35:21.2093673Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.2093678Z 2025-12-04T10:35:21.2093684Z 2025-12-04T10:35:21.2093865Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.2094521Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.2094527Z 2025-12-04T10:35:21.2094753Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.2094977Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2095075Z frames [('total', 1)] 2025-12-04T10:35:21.2095170Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2095371Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2095557Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2095635Z graph_break [] 2025-12-04T10:35:21.2095841Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2095934Z frames [('total', 1)] 2025-12-04T10:35:21.2096044Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2096230Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2096427Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2096510Z graph_break [] 2025-12-04T10:35:21.2096628Z =================================== FAILURES =================================== 2025-12-04T10:35:21.2096871Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T10:35:21.2096972Z Traceback (most recent call last): 2025-12-04T10:35:21.2097342Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T10:35:21.2097442Z actual = torch.compile(f)(x) 2025-12-04T10:35:21.2097862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:35:21.2098070Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:35:21.2098508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:35:21.2098670Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:35:21.2099221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:35:21.2099352Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:35:21.2099802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:35:21.2100071Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:35:21.2100565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T10:35:21.2100759Z compiled_module = graph.compile_to_module() 2025-12-04T10:35:21.2101170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T10:35:21.2101270Z return self._compile_to_module() 2025-12-04T10:35:21.2101678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T10:35:21.2101817Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T10:35:21.2102252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T10:35:21.2102363Z mod = PyCodeCache.load_by_key_path( 2025-12-04T10:35:21.2102778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T10:35:21.2102973Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T10:35:21.2103477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T10:35:21.2103580Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T10:35:21.2104049Z File "/tmp/tmpr7h3zpm5/24/c24xvtrwerswq67ic42j5v2wniug2ddaizenwff44lvozr44wsmm.py", line 45, in 2025-12-04T10:35:21.2104447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T10:35:21.2104575Z kernel.precompile( 2025-12-04T10:35:21.2105050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T10:35:21.2105143Z self._precompile_worker() 2025-12-04T10:35:21.2105647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T10:35:21.2105801Z compile_results.append(self._precompile_config(c)) 2025-12-04T10:35:21.2106307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T10:35:21.2106475Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T10:35:21.2106852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T10:35:21.2107056Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T10:35:21.2107436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T10:35:21.2107716Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T10:35:21.2108069Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.2108314Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2108384Z ^ 2025-12-04T10:35:21.2108781Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2108786Z 2025-12-04T10:35:21.2109391Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.2109396Z 2025-12-04T10:35:21.2109400Z 2025-12-04T10:35:21.2109589Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.2110205Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.2110210Z 2025-12-04T10:35:21.2110434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.2110698Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2110782Z frames [('total', 1)] 2025-12-04T10:35:21.2110878Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2111137Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2111325Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2111412Z graph_break [] 2025-12-04T10:35:21.2111587Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2111670Z frames [('total', 1)] 2025-12-04T10:35:21.2111768Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2111950Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2112145Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2112230Z graph_break [] 2025-12-04T10:35:21.2112403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:35:21.2112495Z frames [('total', 1)] 2025-12-04T10:35:21.2112586Z stats [('calls_captured', 1)] 2025-12-04T10:35:21.2112768Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:35:21.2112966Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:35:21.2113050Z graph_break [] 2025-12-04T10:35:21.2113665Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.xml - 2025-12-04T10:35:21.2113814Z =========================== short test summary info ============================ 2025-12-04T10:35:21.2114479Z FAILED [0.2084s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T10:35:21.2114720Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T10:35:21.2114792Z ^ 2025-12-04T10:35:21.2115189Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T10:35:21.2115193Z 2025-12-04T10:35:21.2115808Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:35:21.2115813Z 2025-12-04T10:35:21.2115817Z 2025-12-04T10:35:21.2116001Z To execute this test, run the following from the base repo dir: 2025-12-04T10:35:21.2116613Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.2116620Z 2025-12-04T10:35:21.2116843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:35:21.2116992Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:35:21.2117165Z ================== 1 failed, 187 deselected, 2 rerun in 1.79s ================== 2025-12-04T10:35:21.2117246Z Got exit code 1 2025-12-04T10:35:21.2117655Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T10:35:21.2118004Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:35:21.2118402Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.xml 2025-12-04T10:35:21.2118547Z ============================= test session starts ============================== 2025-12-04T10:35:21.2118839Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:35:21.2118930Z cachedir: .pytest_cache 2025-12-04T10:35:21.2119383Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:35:21.2119487Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:35:21.2119631Z configfile: pytest.ini 2025-12-04T10:35:21.2120090Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:35:21.2120326Z collecting ... collected 188 items / 66 deselected / 122 selected 2025-12-04T10:35:21.2120454Z stepcurrent: skipping 66 already run items. 2025-12-04T10:35:21.2120548Z Running 122 items in this shard 2025-12-04T10:35:21.2120555Z 2025-12-04T10:35:21.2120928Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda PASSED [1.4014s] [ 0%] 2025-12-04T10:35:21.2121695Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 1%] 2025-12-04T10:35:21.2122455Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 2%] 2025-12-04T10:35:21.2123213Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 3%] 2025-12-04T10:35:21.2124012Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 4%] 2025-12-04T10:35:21.2124782Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 4%] 2025-12-04T10:35:21.2125579Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 5%] 2025-12-04T10:35:21.2126387Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 6%] 2025-12-04T10:35:21.2127142Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 7%] 2025-12-04T10:35:21.2127576Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda SKIPPED [0.0002s] (Not supported on non B200) [ 8%] 2025-12-04T10:35:21.2128111Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 9%] 2025-12-04T10:35:21.2128930Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 9%] 2025-12-04T10:35:21.2129759Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 10%] 2025-12-04T10:35:21.2130561Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 11%] 2025-12-04T10:35:21.2131374Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 12%] 2025-12-04T10:35:21.2132171Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 13%] 2025-12-04T10:35:21.2133068Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 13%] 2025-12-04T10:35:21.2133865Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 14%] 2025-12-04T10:35:21.2134675Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 15%] 2025-12-04T10:35:21.2135462Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 16%] 2025-12-04T10:35:21.2136309Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 17%] 2025-12-04T10:35:21.2137140Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 18%] 2025-12-04T10:35:21.2137931Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 18%] 2025-12-04T10:35:21.2138785Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 19%] 2025-12-04T10:35:21.2139647Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 20%] 2025-12-04T10:35:21.2140444Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 21%] 2025-12-04T10:35:21.2141246Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 22%] 2025-12-04T10:35:21.2142051Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 22%] 2025-12-04T10:35:21.2142854Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 23%] 2025-12-04T10:35:21.2143655Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 24%] 2025-12-04T10:35:21.2144464Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 25%] 2025-12-04T10:35:21.2145257Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 26%] 2025-12-04T10:35:21.2146169Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 27%] 2025-12-04T10:35:21.2146955Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 27%] 2025-12-04T10:35:21.2147751Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 28%] 2025-12-04T10:35:21.2148551Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 29%] 2025-12-04T10:35:21.2149366Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 30%] 2025-12-04T10:35:21.2150188Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 31%] 2025-12-04T10:35:21.2150986Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 31%] 2025-12-04T10:35:21.2151806Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 32%] 2025-12-04T10:35:21.2152600Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 33%] 2025-12-04T10:35:21.2153477Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 34%] 2025-12-04T10:35:21.2154334Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 35%] 2025-12-04T10:35:21.2155193Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 36%] 2025-12-04T10:35:21.2156046Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 36%] 2025-12-04T10:35:21.2156896Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 37%] 2025-12-04T10:35:21.2157724Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 38%] 2025-12-04T10:35:21.2158560Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 39%] 2025-12-04T10:35:21.2159464Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 40%] 2025-12-04T10:35:21.2160308Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 40%] 2025-12-04T10:35:21.2161139Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 41%] 2025-12-04T10:35:21.2161971Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 42%] 2025-12-04T10:35:21.2162809Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 43%] 2025-12-04T10:35:21.2163608Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 44%] 2025-12-04T10:35:21.2164371Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 45%] 2025-12-04T10:35:21.2165139Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 45%] 2025-12-04T10:35:21.2165875Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 46%] 2025-12-04T10:35:21.2166489Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 47%] 2025-12-04T10:35:21.2167321Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 48%] 2025-12-04T10:35:21.2168161Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 49%] 2025-12-04T10:35:21.2168980Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 50%] 2025-12-04T10:35:21.2169813Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 50%] 2025-12-04T10:35:21.2170624Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 51%] 2025-12-04T10:35:21.2171452Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 52%] 2025-12-04T10:35:21.2172346Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 53%] 2025-12-04T10:35:21.2173183Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 54%] 2025-12-04T10:35:21.2173985Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 54%] 2025-12-04T10:35:21.2174804Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 55%] 2025-12-04T10:35:21.2175610Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 56%] 2025-12-04T10:35:21.2176552Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 57%] 2025-12-04T10:35:21.2177371Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 58%] 2025-12-04T10:35:21.2178241Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 59%] 2025-12-04T10:35:21.2179116Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 59%] 2025-12-04T10:35:21.2179935Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 60%] 2025-12-04T10:35:21.2180745Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 61%] 2025-12-04T10:35:21.2181563Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 62%] 2025-12-04T10:35:21.2182388Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 63%] 2025-12-04T10:35:21.2183211Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 63%] 2025-12-04T10:35:21.2184024Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 64%] 2025-12-04T10:35:21.2184843Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 65%] 2025-12-04T10:35:21.2185737Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 66%] 2025-12-04T10:35:21.2186558Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 67%] 2025-12-04T10:35:21.2187370Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 68%] 2025-12-04T10:35:21.2188204Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 68%] 2025-12-04T10:35:21.2189012Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 69%] 2025-12-04T10:35:21.2189865Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 70%] 2025-12-04T10:35:21.2190661Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 71%] 2025-12-04T10:35:21.2191546Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 72%] 2025-12-04T10:35:21.2192513Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 72%] 2025-12-04T10:35:21.2193480Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 73%] 2025-12-04T10:35:21.2194428Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 74%] 2025-12-04T10:35:21.2195384Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 75%] 2025-12-04T10:35:21.2196365Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 76%] 2025-12-04T10:35:21.2197293Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 77%] 2025-12-04T10:35:21.2198217Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 77%] 2025-12-04T10:35:21.2199214Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1397s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 78%] 2025-12-04T10:35:21.2200148Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 79%] 2025-12-04T10:35:21.2201071Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 80%] 2025-12-04T10:35:21.2202005Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 81%] 2025-12-04T10:35:21.2202928Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 81%] 2025-12-04T10:35:21.2203917Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 82%] 2025-12-04T10:35:21.2204900Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 83%] 2025-12-04T10:35:21.2205851Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 84%] 2025-12-04T10:35:21.2206840Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 85%] 2025-12-04T10:35:21.2207900Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 86%] 2025-12-04T10:35:21.2208817Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 86%] 2025-12-04T10:35:21.2209744Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 87%] 2025-12-04T10:35:21.2210650Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 88%] 2025-12-04T10:35:21.2211572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 89%] 2025-12-04T10:35:21.2212480Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 90%] 2025-12-04T10:35:21.2213515Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 90%] 2025-12-04T10:35:21.2214430Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 91%] 2025-12-04T10:35:21.2215284Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 92%] 2025-12-04T10:35:21.2216142Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 93%] 2025-12-04T10:35:21.2217029Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 94%] 2025-12-04T10:35:21.2217850Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 95%] 2025-12-04T10:35:21.2218744Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 95%] 2025-12-04T10:35:21.2219639Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 96%] 2025-12-04T10:35:21.2220454Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 97%] 2025-12-04T10:35:21.2221264Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 98%] 2025-12-04T10:35:21.2221864Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 99%] 2025-12-04T10:35:21.2222620Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [100%] 2025-12-04T10:35:21.2222638Z 2025-12-04T10:35:21.2223356Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.xml - 2025-12-04T10:35:21.2223600Z ================ 1 passed, 121 skipped, 66 deselected in 1.73s ================= 2025-12-04T10:35:21.2239339Z The following tests failed consistently: ['test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda'] 2025-12-04T10:35:21.2239440Z 2025-12-04T10:35:21.2239827Z FINISHED PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_440b1865b73f9802_.log) 2025-12-04T10:35:21.2239834Z 2025-12-04T10:35:21.2240087Z Finished inductor/test_fp8 1/1 ... [2025-12-04 10:35:19.644358][4991.653573178], took 19.80min 2025-12-04T10:35:21.2240697Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.xml 2025-12-04T10:35:21.2241348Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.xml 2025-12-04T10:35:21.2241989Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.xml 2025-12-04T10:35:21.2242582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.xml 2025-12-04T10:35:21.2243182Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.xml 2025-12-04T10:35:21.2243778Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.xml 2025-12-04T10:35:21.2244365Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.xml 2025-12-04T10:35:21.2244973Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.xml 2025-12-04T10:35:21.2245570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.xml 2025-12-04T10:35:21.2246236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.xml 2025-12-04T10:35:21.2246828Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.xml 2025-12-04T10:35:21.2247464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.xml 2025-12-04T10:35:21.2248075Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.xml 2025-12-04T10:35:21.2248675Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.xml 2025-12-04T10:35:21.2249274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.xml 2025-12-04T10:35:21.2249873Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.xml 2025-12-04T10:35:21.2250472Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.xml 2025-12-04T10:35:21.2251063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.xml 2025-12-04T10:35:21.2251658Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.xml 2025-12-04T10:35:21.2252269Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.xml 2025-12-04T10:35:21.2252866Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.xml 2025-12-04T10:35:21.2253462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.xml 2025-12-04T10:35:21.2254052Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.xml 2025-12-04T10:35:21.2254640Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.xml 2025-12-04T10:35:21.2255275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.xml 2025-12-04T10:35:21.2255938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.xml 2025-12-04T10:35:21.2256573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.xml 2025-12-04T10:35:21.2257159Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.xml 2025-12-04T10:35:21.2257761Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.xml 2025-12-04T10:35:21.2258353Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.xml 2025-12-04T10:35:21.2258952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.xml 2025-12-04T10:35:21.2259613Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.xml 2025-12-04T10:35:21.2260246Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.xml 2025-12-04T10:35:21.2260881Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.xml 2025-12-04T10:35:21.2261514Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.xml 2025-12-04T10:35:21.2262113Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.xml 2025-12-04T10:35:21.2262712Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.xml 2025-12-04T10:35:21.2263308Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.xml 2025-12-04T10:35:21.2263910Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.xml 2025-12-04T10:35:21.2264504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.xml 2025-12-04T10:35:21.2265111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.xml 2025-12-04T10:35:21.2265704Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.xml 2025-12-04T10:35:21.2266343Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.xml 2025-12-04T10:35:21.2266947Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.xml 2025-12-04T10:35:21.2267536Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.xml 2025-12-04T10:35:21.2268144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.xml 2025-12-04T10:35:21.2268750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.xml 2025-12-04T10:35:21.2269399Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.xml 2025-12-04T10:35:21.2270037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.xml 2025-12-04T10:35:21.2270643Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.xml 2025-12-04T10:35:21.2514198Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.xml 2025-12-04T10:35:21.2782218Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.xml 2025-12-04T10:35:21.3079188Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.xml 2025-12-04T10:35:21.3372614Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.xml 2025-12-04T10:35:21.3660864Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.xml 2025-12-04T10:35:21.4020613Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.xml 2025-12-04T10:35:21.4368048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.xml 2025-12-04T10:35:21.4683200Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.xml 2025-12-04T10:35:21.4974175Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.xml 2025-12-04T10:35:21.5253660Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.xml 2025-12-04T10:35:21.5566514Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.xml 2025-12-04T10:35:21.5897280Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.xml 2025-12-04T10:35:21.6185525Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.xml 2025-12-04T10:35:21.6509267Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.xml 2025-12-04T10:35:21.6756242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.xml 2025-12-04T10:35:21.7073485Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.xml 2025-12-04T10:35:21.7361048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.xml 2025-12-04T10:35:21.7651732Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.xml 2025-12-04T10:35:21.7955041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.xml 2025-12-04T10:35:21.8250363Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.xml 2025-12-04T10:35:21.8549607Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.xml 2025-12-04T10:35:21.8867338Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.xml 2025-12-04T10:35:21.9125174Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.xml 2025-12-04T10:35:21.9413791Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.xml 2025-12-04T10:35:21.9729077Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.xml 2025-12-04T10:35:22.0066345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.xml 2025-12-04T10:35:22.0328554Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.xml 2025-12-04T10:35:22.0631932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.xml 2025-12-04T10:35:22.1009812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.xml 2025-12-04T10:35:22.1315479Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.xml 2025-12-04T10:35:22.1629926Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.xml 2025-12-04T10:35:22.1947532Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.xml 2025-12-04T10:35:22.2210543Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.xml 2025-12-04T10:35:22.2468201Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.xml 2025-12-04T10:35:22.2743059Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.xml 2025-12-04T10:35:22.3179229Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.xml 2025-12-04T10:35:22.3478372Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.xml 2025-12-04T10:35:22.3782863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.xml 2025-12-04T10:35:22.4073446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.xml 2025-12-04T10:35:22.4400893Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.xml 2025-12-04T10:35:22.4830719Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.xml 2025-12-04T10:35:22.5151191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.xml 2025-12-04T10:35:22.5453926Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.xml 2025-12-04T10:35:22.5813768Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.xml 2025-12-04T10:35:22.6092699Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.xml 2025-12-04T10:35:22.6639125Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.xml 2025-12-04T10:35:22.6918235Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.xml 2025-12-04T10:35:22.7194421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.xml 2025-12-04T10:35:22.7448662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.xml 2025-12-04T10:35:22.7742208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.xml 2025-12-04T10:35:22.8032296Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.xml 2025-12-04T10:35:22.8821508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.xml 2025-12-04T10:35:22.9128820Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.xml 2025-12-04T10:35:22.9422120Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.xml 2025-12-04T10:35:22.9725923Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.xml 2025-12-04T10:35:23.0010633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.xml 2025-12-04T10:35:23.3521061Z Uploading logs for 57118183212 to S3 2025-12-04T10:35:23.5029425Z Uploading artifacts took 0.47 seconds 2025-12-04T10:35:23.5029849Z inductor/test_fp8 1/1 failed! 2025-12-04T10:35:23.5033372Z Running dynamo/test_model_output 1/1 ... [2025-12-04 10:35:23.502972][4995.512194486] 2025-12-04T10:35:23.5034020Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:35:23.5038188Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_model_output.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:35:23.503420] 2025-12-04T10:35:27.5262471Z 2025-12-04T10:35:27.5264533Z dynamo/test_model_output 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_model_output_1.1_2df9271f2ebae91b_.log 2025-12-04T10:35:27.5272579Z Running 18 items in this shard: test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained, test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained_non_const_attr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_assign, test/dynamo/test_model_output.py::TestModelOutput::test_mo_create, test/dynamo/test_model_output.py::TestModelOutput::test_mo_from_outside, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr_missing, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getitem, test/dynamo/test_model_output.py::TestModelOutput::test_mo_index, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init2, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init_with_disable, test/dynamo/test_model_output.py::TestModelOutput::test_mo_newkey, test/dynamo/test_model_output.py::TestModelOutput::test_mo_reconstruct_bytecode, test/dynamo/test_model_output.py::TestModelOutput::test_mo_tuple, test/dynamo/test_model_output.py::TestModelOutput::test_none, test/dynamo/test_model_output.py::TestModelOutput::test_reconstruction, test/dynamo/test_model_output.py::TestModelOutputBertCUDA::test_HF_bert_model_output_cuda 2025-12-04T10:35:27.5278391Z 2025-12-04T10:35:27.5278685Z Finished dynamo/test_model_output 1/1 ... [2025-12-04 10:35:27.525811][4999.535035943], took 0.07min 2025-12-04T10:35:27.5412289Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-fcf8b9b0a2e7a178.xml 2025-12-04T10:35:27.5733722Z Running inductor/test_triton_kernels 1/1 ... [2025-12-04 10:35:27.572980][4999.582204761] 2025-12-04T10:35:27.5734381Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:35:27.5737498Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:35:27.573331] 2025-12-04T10:38:04.9944647Z 2025-12-04T10:38:04.9946990Z inductor/test_triton_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_kernels_1.1_4c43492168172809_.log 2025-12-04T10:38:05.0127164Z Running 366 items in this shard: test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_i64_input, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_layout_constraint_needs_fixed_stride_order, test/inductor/test_triton_kernels.py::KernelTests::test_no_nan_kernels, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_triton_attrs_dict_equal_1_None_format, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching_duplicate, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_constants, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dependancies, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_cpp_wrapper, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_normal, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_mm_kernels_do_not_change, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_unaffected, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_fallback, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float16, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float32, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float64, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_global_constexpr, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_higher_order_func, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inputs_buffer_reuse, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_matmul_tracking, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_not_mark_dirty, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_type, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_none_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_out_of_order, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_reinplace_inplaceable_pass, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_slice_and_view_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input_nonzero_offset, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_to_cpu, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_various_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_constexpr_function, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol_with_custom_name, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_kernel_param, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop2, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_new_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_old_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_argmax, test/inductor/test_triton_kernels.py::MutationTests::test_branch_with_multiple_yield_args, test/inductor/test_triton_kernels.py::MutationTests::test_cumsum, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_one_return, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg_2, test/inductor/test_triton_kernels.py::MutationTests::test_get_tma_stores, test/inductor/test_triton_kernels.py::MutationTests::test_labels, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_4_times_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_2d_autotuned, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_block_ptr, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_import, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_atomic_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel1, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_false, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_true, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_kernel_with_block_ptr_2d, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_mul2_inplace_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_nested_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel_call, test/inductor/test_triton_kernels.py::MutationTests::test_reduce_sum, test/inductor/test_triton_kernels.py::MutationTests::test_triton_kernel_inference_mode, test/inductor/test_triton_kernels.py::MutationTests::test_while_loop, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_no_pre_or_post_hook_user_defined, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_unbacked, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_meta, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_mutable_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_triton_kernel, test/inductor/test_triton_kernels.py::CustomOpTests::test_subclass, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_dynamic_grid_no_recompile, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_inductor, test/inductor/test_triton_kernels.py::CustomOpTests::test_wrap_triton_disabled_in_triton_op 2025-12-04T10:38:05.0305217Z 2025-12-04T10:38:05.0305533Z Finished inductor/test_triton_kernels 1/1 ... [2025-12-04 10:38:04.994456][5157.003678354], took 2.62min 2025-12-04T10:38:05.0306775Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-cc2491bbd877af9c.xml 2025-12-04T10:38:05.1080369Z Running inductor/test_loop_ordering 1/1 ... [2025-12-04 10:38:05.107619][5157.116841139] 2025-12-04T10:38:05.1080898Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:38:05.1083460Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_loop_ordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:38:05.107956] 2025-12-04T10:38:42.0923394Z 2025-12-04T10:38:42.0926456Z inductor/test_loop_ordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_loop_ordering_1.1_cda1b68c4235c80b_.log 2025-12-04T10:38:42.0946065Z Running 53 items in this shard: test/inductor/test_loop_ordering.py::ImplDetailTest::test_merge_loops_invalidate_pw_dep_cache, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_and_merge_loops, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_modular_indexing, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_twice, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_3dred_pw_2d_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_apbt_realize, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_broadcast_shapes, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_reduction_order, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_for_reordering_reindex, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_pattern_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_reduction_with_tiled_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_with_scalar_shared_memory, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_multi_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_triton_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_keep_fake_dep, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_softmax, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_sum_fuse_with_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_sum_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_view, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_coalescing, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps0, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps1, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps2, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps3, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_no_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads_split, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_zero, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_False, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_True, test/inductor/test_loop_ordering.py::TestTiling::test_3d_pointwise, test/inductor/test_loop_ordering.py::TestTiling::test_cat, test/inductor/test_loop_ordering.py::TestTiling::test_find_broadcast_var, test/inductor/test_loop_ordering.py::TestTiling::test_mutation_deps, test/inductor/test_loop_ordering.py::TestTiling::test_penalized_small_dim, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_tiled_reduction, test/inductor/test_loop_ordering.py::TestIndexInversion::test_inversion_cases, test/inductor/test_loop_ordering.py::TestIndexInversion::test_original_complex_expression 2025-12-04T10:38:42.0964871Z 2025-12-04T10:38:42.0965226Z Finished inductor/test_loop_ordering 1/1 ... [2025-12-04 10:38:42.092100][5194.101324917], took 0.62min 2025-12-04T10:38:42.1080258Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-66246eed1b64fd5c.xml 2025-12-04T10:38:42.1914710Z Running export/test_serdes 1/1 ... [2025-12-04 10:38:42.191046][5194.200268439] 2025-12-04T10:38:42.1915166Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:38:42.1917705Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:38:42.191392] 2025-12-04T10:41:57.5428707Z 2025-12-04T10:41:57.5429656Z export/test_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serdes_1.1_c37c9c83d5d3a964_.log 2025-12-04T10:41:57.5850127Z Running 880 items in this shard: test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_assume_static_by_default_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_maxsize_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_no_grad_param_inplace_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_assume_static_by_default_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_maxsize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_no_grad_param_inplace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportTestExport::test__scaled_dot_product_flash_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_additional_inputs_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_annotate_on_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_args_type_checked_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_aten_lift_fresh_copy_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attr_assignment_extra_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_constrain_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_baddbmm_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_fake_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_buffer_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_torch_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_wrong_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ccode_python_mod_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cdist_forward_compute_mode_zero_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_check_specialized_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_checks_to_constrain_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cleanup_dynamic_markers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colin_unbacked_backed_vr_sub_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colon_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_compiling_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_access_identical_symint_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_constant_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_same_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_contains_unbacked_no_escape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_int_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_input_naming_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_no_user_inp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_dup_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_requires_grad_const_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_in_eager_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_constrain_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_various_cases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_conv_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_crop_like_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cse_for_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_preserve_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_tag_metadata_re_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_batch_norm_functional_predispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_after_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_before_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_default_decomposition_core_cia_ops_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_integer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_strict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_gpu_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_float_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_auto_and_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_divisibility_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_range_violations_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_ok_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_into_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_reduce_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_to_all_single_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_reduce_scatter_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dont_duck_size_for_auto_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_double_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_list_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_with_nan_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_fake_kernel_inference_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_infers_fake_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_lr_shift_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_bounds_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_dataclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_inferred_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_generic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_user_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_various_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_spec_with_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_sym_round_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ends_of_bounds_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_enum_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_does_not_reference_eager_fallback_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_when_passing_mutating_primitive_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_exception_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_expand_copy_export_handles_implicit_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_api_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_as_backend_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_lifted_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_scandim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_symbool_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_warns_constant_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_basic_pop_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_container_methods_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_op_lib_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_mutable_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cyclic_reference_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_dynamo_config_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_run_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_state_dict_hooks_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_default_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_keyword_only_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_pytree_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_pytree_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_postional_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_function_schema_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_graph_with_no_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_bug_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_static_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_leak_compile_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_linear_preserve_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_onnx_reported_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_mod_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_at_aot_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_but_not_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_rnn_variants_with_warning_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_scan_pytree_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_script_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_statically_known_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_then_compile_tensor_ctor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_autocast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_complex_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_set_grad_enabled_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_wrong_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_external_call_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_weights_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_filter_traceback_frames_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_flex_attention_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_from_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fqn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_from_node_metadata_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_full_on_scalar_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_function_holding_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hints_wrapper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hoo_inline_users_issue_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_post_autograd_op_preserved_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inductor_backend_inside_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_recursive_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_int_shape_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_intermediate_shape_comp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_exporting_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_nonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_isnonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_113041_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_157289_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_161902_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_istft_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_invalid_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwargs_reorder_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_norm_unbacked_normalized_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_sharing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lazy_module_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_linear_conv_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_malformed_fqn_from_source_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mask_nonzero_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_masked_select_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_math_pow_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mismatched_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mixed_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_dict_key_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_list_slice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_with_dict_container_inp_out_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_modules_access_for_deleted_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_more_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multinomial_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multiple_definitions_same_name_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_namedtuple_input_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_native_multi_attention_head_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_dynamic_shapes_spec_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_fake_tensor_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_constant_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_init_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_check_is_size_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_persistent_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_none_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonstrict_retrace_preserves_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_not_registered_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_operator_aten_tensor_mode_variant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_output_node_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pad_sequence_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_param_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_partial_patched_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_variadic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_update_preserving_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_grad_wrappers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_annotation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_requires_grad_placeholders_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_profiling_code_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_python_asserts_with_sym_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_nested_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_range_constraints_with_replacement_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_alias_dtype_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_bool_cast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_for_max_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_size_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_assert_max_upper_bound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_register_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_repeat_interleave_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replaced_unbacked_bindings_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_reshape_view_helper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retracable_ep_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retrace_pre_autograd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decomposition_supports_user_input_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prm_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_with_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sdpa_gqa_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sequential_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_example_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_as_side_effect_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_empty_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_setgrad_lifted_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_shared_submodule_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_export_for_training_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_unbacked_view_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_size_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_slice_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_solver_unsupported_sympy_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_specialize_derived_dim_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_split_const_gm_with_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_make_fx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_primitives_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_shape_attribute_assignment_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_tensors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_static_dim_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_context_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_regular_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_new_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_float_operators_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_or_sym_and_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_sqrt_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symbool_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symfloat_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_additional_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_shapes_collection_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_tensor_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tag_ac_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_attribute_zero_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_aten_to_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_with_wrapped_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tolist_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_check_eq_commutativity_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_fn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_trace_under_fake_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_train_eval_on_exported_preautograd_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tril_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_triu_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_3d_matmul_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_deferred_runtime_retrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_expand_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_infer_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_kth_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_linear_layer_norm_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_noncontig_lin_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_pad_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_scalar_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_passthrough_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_unsqueeze_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_isinstance_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_no_unroll_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_buf_8_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_aliases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_use_embedding_twice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_user_input_and_buffer_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_custom_autograd_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_to_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_where_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_assert_separation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_index_assertions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_tensor_constant_idx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_wrapper_module_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test__scaled_dot_product_flash_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_additional_inputs_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_annotate_on_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_args_type_checked_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_aten_lift_fresh_copy_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attr_assignment_extra_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_constrain_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_baddbmm_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_fake_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_buffer_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_torch_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_wrong_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ccode_python_mod_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cdist_forward_compute_mode_zero_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_check_specialized_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_checks_to_constrain_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cleanup_dynamic_markers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colin_unbacked_backed_vr_sub_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colon_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_compiling_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_access_identical_symint_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_constant_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_same_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_contains_unbacked_no_escape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_int_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_input_naming_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_no_user_inp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_dup_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_requires_grad_const_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_in_eager_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_constrain_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_various_cases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_conv_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_crop_like_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cse_for_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_preserve_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_tag_metadata_re_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_batch_norm_functional_predispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_after_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_before_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_default_decomposition_core_cia_ops_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_integer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_strict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_gpu_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_float_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_auto_and_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_divisibility_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_range_violations_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_ok_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_into_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_reduce_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_to_all_single_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_reduce_scatter_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dont_duck_size_for_auto_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_double_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_list_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_with_nan_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_fake_kernel_inference_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_infers_fake_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_lr_shift_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_bounds_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_dataclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_inferred_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_generic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_user_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_various_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_spec_with_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_sym_round_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ends_of_bounds_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_enum_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_does_not_reference_eager_fallback_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_when_passing_mutating_primitive_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_exception_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_expand_copy_export_handles_implicit_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_api_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_as_backend_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_lifted_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_scandim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_symbool_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_warns_constant_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_basic_pop_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_container_methods_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_op_lib_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_mutable_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cyclic_reference_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_dynamo_config_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_run_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_state_dict_hooks_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_default_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_keyword_only_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_pytree_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_pytree_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_postional_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_function_schema_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_graph_with_no_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_bug_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_static_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_leak_compile_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_linear_preserve_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_onnx_reported_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_mod_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_at_aot_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_but_not_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_rnn_variants_with_warning_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_scan_pytree_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_script_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_statically_known_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_then_compile_tensor_ctor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_autocast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_complex_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_set_grad_enabled_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_wrong_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_external_call_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_weights_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_filter_traceback_frames_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_flex_attention_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_from_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fqn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_from_node_metadata_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_full_on_scalar_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_function_holding_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hints_wrapper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hoo_inline_users_issue_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_post_autograd_op_preserved_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inductor_backend_inside_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_recursive_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_int_shape_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_intermediate_shape_comp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_exporting_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_nonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_isnonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_113041_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_157289_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_161902_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_istft_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_invalid_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwargs_reorder_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_norm_unbacked_normalized_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_sharing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lazy_module_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_linear_conv_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_malformed_fqn_from_source_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mask_nonzero_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_masked_select_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_math_pow_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mismatched_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mixed_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_dict_key_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_list_slice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_with_dict_container_inp_out_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_modules_access_for_deleted_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_more_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multinomial_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multiple_definitions_same_name_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_namedtuple_input_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_native_multi_attention_head_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_dynamic_shapes_spec_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_fake_tensor_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_constant_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_init_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_check_is_size_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_persistent_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_none_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonstrict_retrace_preserves_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_not_registered_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_operator_aten_tensor_mode_variant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_output_node_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pad_sequence_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_param_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_partial_patched_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_variadic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_update_preserving_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_grad_wrappers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_annotation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_requires_grad_placeholders_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_profiling_code_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_python_asserts_with_sym_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_nested_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_range_constraints_with_replacement_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_alias_dtype_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_bool_cast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_for_max_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_size_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_assert_max_upper_bound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_register_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_repeat_interleave_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replaced_unbacked_bindings_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_reshape_view_helper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retracable_ep_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retrace_pre_autograd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decomposition_supports_user_input_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prm_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_with_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sdpa_gqa_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sequential_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_example_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_as_side_effect_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_empty_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_setgrad_lifted_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_shared_submodule_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_export_for_training_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_unbacked_view_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_size_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_slice_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_solver_unsupported_sympy_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_specialize_derived_dim_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_split_const_gm_with_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_make_fx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_primitives_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_shape_attribute_assignment_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_tensors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_static_dim_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_context_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_regular_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_new_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_float_operators_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_or_sym_and_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_sqrt_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symbool_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symfloat_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_additional_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_shapes_collection_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_tensor_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tag_ac_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_attribute_zero_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_aten_to_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_with_wrapped_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tolist_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_check_eq_commutativity_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_fn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_trace_under_fake_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_train_eval_on_exported_preautograd_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tril_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_triu_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_3d_matmul_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_deferred_runtime_retrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_expand_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_infer_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_kth_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_linear_layer_norm_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_noncontig_lin_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_pad_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_scalar_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_passthrough_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_unsqueeze_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_isinstance_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_no_unroll_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_buf_8_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_aliases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_use_embedding_twice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_user_input_and_buffer_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_custom_autograd_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_to_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_where_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_assert_separation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_index_assertions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_tensor_constant_idx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_wrapper_module_serdes_nonstrict 2025-12-04T10:41:57.6266320Z 2025-12-04T10:41:57.6266603Z Finished export/test_serdes 1/1 ... [2025-12-04 10:41:57.544328][5389.553550452], took 3.26min 2025-12-04T10:41:57.6267608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_serdes/export.test_serdes-38411ac3079c7061.xml 2025-12-04T10:41:57.6918610Z Running inductor/test_scatter_optimization 1/1 ... [2025-12-04 10:41:57.691388][5389.700609018] 2025-12-04T10:41:57.6919151Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:41:57.6921559Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_scatter_optimization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:41:57.691758] 2025-12-04T10:42:12.1954594Z 2025-12-04T10:42:12.1955592Z inductor/test_scatter_optimization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_scatter_optimization_1.1_38363d3a7ae9f86e_.log 2025-12-04T10:42:12.1959420Z Running 8 items in this shard: test/inductor/test_scatter_optimization.py::TestScatterOpt::test_3d_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_dense, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_non_const, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_cross_entropy_loss, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_neg_scatter_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_non_last_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_nonzero_const_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_shorter_index_tensor 2025-12-04T10:42:12.1962594Z 2025-12-04T10:42:12.1962942Z Finished inductor/test_scatter_optimization 1/1 ... [2025-12-04 10:42:12.195133][5404.204356776], took 0.24min 2025-12-04T10:42:12.2113623Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-ca7327bb8f17c961.xml 2025-12-04T10:42:12.2903300Z Running inductor/test_padding 1/1 ... [2025-12-04 10:42:12.289935][5404.299157682] 2025-12-04T10:42:12.2903779Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:42:12.2906391Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:42:12.290272] 2025-12-04T10:42:48.6777139Z 2025-12-04T10:42:48.6778029Z inductor/test_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_padding_1.1_3b58a6813a3709bc_.log 2025-12-04T10:42:48.6804229Z Running 55 items in this shard: test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_BertForMaskedLM, test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_LinearAndSoftmax_both_shapes, test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_nobias_LinearAndSoftmax_both_shapes, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_longformer, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_longformer_small_bs, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_nvidia_deeprecommender, test/inductor/test_padding.py::PaddingTest::test_LinearAndSoftmax_codegen, test/inductor/test_padding.py::PaddingTest::test_attention, test/inductor/test_padding.py::PaddingTest::test_cat, test/inductor/test_padding.py::PaddingTest::test_conv, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape2_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape3_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape6_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape7_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_matmul, test/inductor/test_padding.py::PaddingTest::test_mm_padding_perf, test/inductor/test_padding.py::PaddingTest::test_nobias_LinearAndSoftmax_codegen, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape0_alignment_bytes_32_pad_output_False, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape1_alignment_bytes_32_pad_output_True, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape2_alignment_bytes_64_pad_output_False, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape3_alignment_bytes_64_pad_output_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape2_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape3_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape6_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape7_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_pad_3d_tensor, test/inductor/test_padding.py::PaddingTest::test_pad_channels_last, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape0_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape0_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape1_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape1_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape0_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape0_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape1_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape1_float32, test/inductor/test_padding.py::PaddingTest::test_pad_strides, test/inductor/test_padding.py::PaddingTest::test_pad_strides_skip, test/inductor/test_padding.py::PaddingTest::test_padmm, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape0_perm0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape1_perm1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape2_perm2_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape3_perm3_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape4_perm4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape5_perm5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape6_perm6_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape7_perm7_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_view 2025-12-04T10:42:48.6828166Z 2025-12-04T10:42:48.6828587Z Finished inductor/test_padding 1/1 ... [2025-12-04 10:42:48.677424][5440.686646006], took 0.61min 2025-12-04T10:42:48.6945605Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-b7f63c3b423acf1d.xml 2025-12-04T10:42:48.7802374Z Running dynamo/test_callback 1/1 ... [2025-12-04 10:42:48.779855][5440.789078719] 2025-12-04T10:42:48.7802940Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:42:48.7806108Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_callback.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:42:48.780209] 2025-12-04T10:43:02.2230395Z 2025-12-04T10:43:02.2231299Z dynamo/test_callback 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_callback_1.1_4647abf0637b193b_.log 2025-12-04T10:43:02.2233327Z Running 4 items in this shard: test/dynamo/test_callback.py::CallbackTests::test_callbacks_with_duplicate_prevention, test/dynamo/test_callback.py::CallbackTests::test_counter, test/dynamo/test_callback.py::CallbackTests::test_counter_assertion, test/dynamo/test_callback.py::CallbackTests::test_triggers 2025-12-04T10:43:02.2234681Z 2025-12-04T10:43:02.2234973Z Finished dynamo/test_callback 1/1 ... [2025-12-04 10:43:02.222596][5454.231819815], took 0.22min 2025-12-04T10:43:02.2406731Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_callback/dynamo.test_callback-6c0ee54264bcedf0.xml 2025-12-04T10:43:02.3237328Z Running inductor/test_custom_op_autotune 1/1 ... [2025-12-04 10:43:02.323317][5454.332539314] 2025-12-04T10:43:02.3237966Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:43:02.3240812Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_custom_op_autotune.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:43:02.323688] 2025-12-04T10:43:22.7815959Z 2025-12-04T10:43:22.7817470Z inductor/test_custom_op_autotune 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_custom_op_autotune_1.1_2272505dccfac9af_.log 2025-12-04T10:43:22.7821095Z Running 3 items in this shard: test/inductor/test_custom_op_autotune.py::TestCustomOpAutoTune::test_decompose_k_custom_op_autotune_dynamic_config_for_input_shape, test/inductor/test_custom_op_autotune.py::TestCustomOpAutoTune::test_multi_parameter_tuning, test/inductor/test_custom_op_autotune.py::TestCustomOpAutoTune::test_rmsnorm_custom_op_autotune_with_dynamic_shape 2025-12-04T10:43:22.7823858Z 2025-12-04T10:43:22.7824451Z Finished inductor/test_custom_op_autotune 1/1 ... [2025-12-04 10:43:22.781254][5474.790478088], took 0.34min 2025-12-04T10:43:22.7984655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_custom_op_autotune/inductor.test_custom_op_autotune-8f7d8d00cc13374f.xml 2025-12-04T10:43:22.8901374Z Running test_cuda 1/1 ... [2025-12-04 10:43:22.889718][5474.898940519] 2025-12-04T10:43:22.8902025Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:43:22.8904727Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:43:22.890069] 2025-12-04T13:44:40.0741884Z 2025-12-04T13:44:40.0742766Z PRINTING LOG FILE of test_cuda 1/1 (test/test-reports/test_cuda_1.1_5ed6ed395e86485d_.log) 2025-12-04T13:44:40.0743801Z Test results will be stored in test-reports/python-pytest/test_cuda/test_cuda-f963d2e44bab839f.xml 2025-12-04T13:44:40.0744593Z ============================= test session starts ============================== 2025-12-04T13:44:40.0745864Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T13:44:40.0746602Z cachedir: .pytest_cache 2025-12-04T13:44:40.0747512Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:44:40.0748677Z rootdir: /var/lib/jenkins/workspace 2025-12-04T13:44:40.0749140Z configfile: pytest.ini 2025-12-04T13:44:40.0750017Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:44:40.0751047Z collecting ... collected 252 items 2025-12-04T13:44:40.0751537Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T13:44:40.0855127Z Running 252 items in this shard: test/test_cuda.py::TestCuda::test_arithmetic_large_tensor, test/test_cuda.py::TestCuda::test_batch_norm_gather_stats, test/test_cuda.py::TestCuda::test_bincount_ext, test/test_cuda.py::TestCuda::test_caching_allocator_record_stream_oom, test/test_cuda.py::TestCuda::test_caching_pinned_memory, test/test_cuda.py::TestCuda::test_check_error, test/test_cuda.py::TestCuda::test_copy_non_blocking, test/test_cuda.py::TestCuda::test_copy_non_blocking_type_conversion, test/test_cuda.py::TestCuda::test_cublas_allow_bf16_reduced_precision_reduction_get_set, test/test_cuda.py::TestCuda::test_cublas_allow_fp16_accumulation_get_set, test/test_cuda.py::TestCuda::test_cublas_allow_fp16_reduced_precision_reduction_get_set, test/test_cuda.py::TestCuda::test_cublas_allow_tf32_get_set, test/test_cuda.py::TestCuda::test_cublas_multiple_threads_same_device, test/test_cuda.py::TestCuda::test_cublas_workspace_explicit_allocation, test/test_cuda.py::TestCuda::test_cuda_get_device_capability, test/test_cuda.py::TestCuda::test_cuda_get_device_name, test/test_cuda.py::TestCuda::test_cuda_get_device_properties, test/test_cuda.py::TestCuda::test_cuda_graph_allocator_propagates_stream, test/test_cuda.py::TestCuda::test_cuda_graph_error_options, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph_exec_keep_graph_False, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph_exec_keep_graph_True, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph_keep_graph_false, test/test_cuda.py::TestCuda::test_cuda_graph_raw_graph_reset_and_recapture, test/test_cuda.py::TestCuda::test_cuda_graph_tensor_item_not_allowed, test/test_cuda.py::TestCuda::test_cuda_kernel_loop_overflow, test/test_cuda.py::TestCuda::test_cuda_kernel_loop_overflow_large, test/test_cuda.py::TestCuda::test_cuda_memory_leak_detection_propagates_errors, test/test_cuda.py::TestCuda::test_cuda_stream_protocol, test/test_cuda.py::TestCuda::test_cudart_register, test/test_cuda.py::TestCuda::test_cudnn_allow_tf32_get_set, test/test_cuda.py::TestCuda::test_cudnn_multiple_threads_same_device, test/test_cuda.py::TestCuda::test_cusparse_multiple_threads_same_device, test/test_cuda.py::TestCuda::test_device_context_manager, test/test_cuda.py::TestCuda::test_device_count_not_cached_pre_init, test/test_cuda.py::TestCuda::test_events, test/test_cuda.py::TestCuda::test_events_elapsedtime, test/test_cuda.py::TestCuda::test_fixed_cuda_assert_async, test/test_cuda.py::TestCuda::test_float32_matmul_precision_get_set, test/test_cuda.py::TestCuda::test_fp32_precision_with_float32_matmul_precision, test/test_cuda.py::TestCuda::test_fp32_precision_with_tf32, test/test_cuda.py::TestCuda::test_gather_bool, test/test_cuda.py::TestCuda::test_gds_fails_in_ci, test/test_cuda.py::TestCuda::test_generic_stream_event, test/test_cuda.py::TestCuda::test_get_device_index, test/test_cuda.py::TestCuda::test_get_per_process_memory_fraction, test/test_cuda.py::TestCuda::test_graph_capture_oom, test/test_cuda.py::TestCuda::test_graph_capture_reset_recapture, test/test_cuda.py::TestCuda::test_graph_capture_simple, test/test_cuda.py::TestCuda::test_graph_checkpoint_preserve_rng_state, test/test_cuda.py::TestCuda::test_graph_concurrent_replay, test/test_cuda.py::TestCuda::test_graph_cudnn_dropout, test/test_cuda.py::TestCuda::test_graph_debugdump, test/test_cuda.py::TestCuda::test_graph_error, test/test_cuda.py::TestCuda::test_graph_is_current_stream_capturing, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_with_amp_cache_disabled_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_with_amp_cache_enabled_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_without_amp_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_without_amp_not_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_same_pool, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_with_amp_cache_enabled_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_make_graphed_callables_without_amp_not_allow_unused_input, test/test_cuda.py::TestCuda::test_graph_manual_seed_mismatch_raises, test/test_cuda.py::TestCuda::test_graph_memory_stats_and_use_result_after_destroy_graph, test/test_cuda.py::TestCuda::test_graph_optims_with_explicitly_capturable_param_groups, test/test_cuda.py::TestCuda::test_graph_record_stream, test/test_cuda.py::TestCuda::test_graph_rng_distributions, test/test_cuda.py::TestCuda::test_graph_rng_functional, test/test_cuda.py::TestCuda::test_graph_three_successive, test/test_cuda.py::TestCuda::test_graph_timing, test/test_cuda.py::TestCuda::test_graph_two_successive, test/test_cuda.py::TestCuda::test_graph_warn_if_has_zero_nodes, test/test_cuda.py::TestCuda::test_graphsafe_set_get_rng_state, test/test_cuda.py::TestCuda::test_hip_device_count, test/test_cuda.py::TestCuda::test_host_memory_stats, test/test_cuda.py::TestCuda::test_huge_index, test/test_cuda.py::TestCuda::test_index_out_of_bounds_exception_cuda, test/test_cuda.py::TestCuda::test_invalid_status_for_legacy_api, test/test_cuda.py::TestCuda::test_is_pinned_no_context, test/test_cuda.py::TestCuda::test_lazy_init, test/test_cuda.py::TestCuda::test_manual_seed, test/test_cuda.py::TestCuda::test_matmul_device_mismatch, test/test_cuda.py::TestCuda::test_matmul_memory_use, test/test_cuda.py::TestCuda::test_max_large_axis, test/test_cuda.py::TestCuda::test_mean_fp16, test/test_cuda.py::TestCuda::test_memory_allocation, test/test_cuda.py::TestCuda::test_memory_stats, test/test_cuda.py::TestCuda::test_memory_stats_of_multiple_generators_and_graphs, test/test_cuda.py::TestCuda::test_min_max_inits, test/test_cuda.py::TestCuda::test_multi_device_context_manager, test/test_cuda.py::TestCuda::test_multi_device_stream_context_manager, test/test_cuda.py::TestCuda::test_multinomial_ext, test/test_cuda.py::TestCuda::test_multinomial_invalid_probs_cuda, test/test_cuda.py::TestCuda::test_noncontiguous_pinned_memory, test/test_cuda.py::TestCuda::test_norm_type_conversion, test/test_cuda.py::TestCuda::test_nvtx, test/test_cuda.py::TestCuda::test_out_of_memory, test/test_cuda.py::TestCuda::test_out_of_memory_retry, test/test_cuda.py::TestCuda::test_pinned_memory_empty_cache, test/test_cuda.py::TestCuda::test_pinned_memory_use_background_threads, test/test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister, test/test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister_multithread, test/test_cuda.py::TestCuda::test_preferred_blas_library_settings, test/test_cuda.py::TestCuda::test_prod_large, test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel, test/test_cuda.py::TestCuda::test_randint_randomness_for_large_range, test/test_cuda.py::TestCuda::test_random_no_reused_random_states_float32, test/test_cuda.py::TestCuda::test_random_no_reused_random_states_float64, test/test_cuda.py::TestCuda::test_record_stream, test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view, test/test_cuda.py::TestCuda::test_reduction_gpu_memory_accessing, test/test_cuda.py::TestCuda::test_repeat_graph_capture_cublas_workspace_memory, test/test_cuda.py::TestCuda::test_rocm_backward_pass_guard, test/test_cuda.py::TestCuda::test_serialization_array_with_empty, test/test_cuda.py::TestCuda::test_serialization_array_with_storage, test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction, test/test_cuda.py::TestCuda::test_specify_improper_device_name, test/test_cuda.py::TestCuda::test_stream_compatibility, test/test_cuda.py::TestCuda::test_stream_context_manager, test/test_cuda.py::TestCuda::test_stream_event_repr, test/test_cuda.py::TestCuda::test_streaming_backwards_callback, test/test_cuda.py::TestCuda::test_streaming_backwards_multiple_streams, test/test_cuda.py::TestCuda::test_streaming_backwards_sync, test/test_cuda.py::TestCuda::test_streaming_backwards_sync_graph_root, test/test_cuda.py::TestCuda::test_streams, test/test_cuda.py::TestCuda::test_sum_fp16, test/test_cuda.py::TestCuda::test_tiny_half_norm_, test/test_cuda.py::TestCuda::test_to_cpu_blocking_by_default, test/test_cuda.py::TestCuda::test_to_non_blocking, test/test_cuda.py::TestCuda::test_to_numpy, test/test_cuda.py::TestCuda::test_torch_manual_seed_seeds_cuda_devices, test/test_cuda.py::TestCuda::test_type_conversions, test/test_cuda.py::TestCuda::test_uuid, test/test_cuda.py::TestCudaMallocAsync::test_allocator_backend, test/test_cuda.py::TestCudaMallocAsync::test_allocator_fuzz, test/test_cuda.py::TestCudaMallocAsync::test_allocator_memory_fraction_setting, test/test_cuda.py::TestCudaMallocAsync::test_allocator_settings, test/test_cuda.py::TestCudaMallocAsync::test_cachingAllocator_raw_alloc, test/test_cuda.py::TestCudaMallocAsync::test_clock_speed, test/test_cuda.py::TestCudaMallocAsync::test_cpp_memory_snapshot_pickle, test/test_cuda.py::TestCudaMallocAsync::test_cycles, test/test_cuda.py::TestCudaMallocAsync::test_device_memory_used, test/test_cuda.py::TestCudaMallocAsync::test_direct_traceback, test/test_cuda.py::TestCudaMallocAsync::test_garbage_collect_expandable, test/test_cuda.py::TestCudaMallocAsync::test_max_split_expandable, test/test_cuda.py::TestCudaMallocAsync::test_memory_compile_regions, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots_free_segment_stack, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots_free_stack, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots_history_context, test/test_cuda.py::TestCudaMallocAsync::test_memory_plots_metadata, test/test_cuda.py::TestCudaMallocAsync::test_memory_profiler_viz, test/test_cuda.py::TestCudaMallocAsync::test_memory_snapshot, test/test_cuda.py::TestCudaMallocAsync::test_memory_snapshot_script, test/test_cuda.py::TestCudaMallocAsync::test_memory_snapshot_with_cpp, test/test_cuda.py::TestCudaMallocAsync::test_notifies_oom, test/test_cuda.py::TestCudaMallocAsync::test_nvml_get_handler, test/test_cuda.py::TestCudaMallocAsync::test_power_draw, test/test_cuda.py::TestCudaMallocAsync::test_raises_oom_max_split_size_mb_setting_False, test/test_cuda.py::TestCudaMallocAsync::test_raises_oom_max_split_size_mb_setting_True, test/test_cuda.py::TestCudaMallocAsync::test_raw_amdsmi_device_count, test/test_cuda.py::TestCudaMallocAsync::test_raw_amdsmi_device_uuids, test/test_cuda.py::TestCudaMallocAsync::test_temperature, test/test_cuda.py::TestCudaMallocAsync::test_uuid_visible_devices, test/test_cuda.py::TestBlockStateAbsorption::test_additional_free_following_checkpoint, test/test_cuda.py::TestBlockStateAbsorption::test_allocate_in_thread_to_pool, test/test_cuda.py::TestBlockStateAbsorption::test_allocated_in_middle_of_segment, test/test_cuda.py::TestBlockStateAbsorption::test_assigning_back_deleter_fns_to_tensor, test/test_cuda.py::TestBlockStateAbsorption::test_check_pool_live_allocations, test/test_cuda.py::TestBlockStateAbsorption::test_middle_allocations_contiguous, test/test_cuda.py::TestBlockStateAbsorption::test_multiple_middle_allocations, test/test_cuda.py::TestBlockStateAbsorption::test_no_triton_on_import, test/test_cuda.py::TestBlockStateAbsorption::test_resnet, test/test_cuda.py::TestBlockStateAbsorption::test_simple, test/test_cuda.py::TestBlockStateAbsorption::test_tensor_dies_after_checkpoint, test/test_cuda.py::TestMemPool::test_graph_capture_reclaim_2_streams, test/test_cuda.py::TestMemPool::test_graph_capture_reclaim_4_streams, test/test_cuda.py::TestMemPool::test_mempool_ctx_multithread, test/test_cuda.py::TestMemPool::test_mempool_empty_cache, test/test_cuda.py::TestMemPool::test_mempool_empty_cache_inactive, test/test_cuda.py::TestMemPool::test_mempool_emptycache_multithread, test/test_cuda.py::TestMemPool::test_mempool_expandable, test/test_cuda.py::TestMemPool::test_mempool_id, test/test_cuda.py::TestMemPool::test_mempool_limited_memory_with_allocator, test/test_cuda.py::TestMemPool::test_mempool_multithread, test/test_cuda.py::TestMemPool::test_mempool_with_allocator, test/test_cuda.py::TestMemPool::test_nested_mempool, test/test_cuda.py::TestGDS::test_gds_read_write_tensors, test/test_cuda.py::TestCudaAutocast::test_autocast_banned, test/test_cuda.py::TestCudaAutocast::test_autocast_cache_leak, test/test_cuda.py::TestCudaAutocast::test_autocast_cat_jit, test/test_cuda.py::TestCudaAutocast::test_autocast_checkpointing, test/test_cuda.py::TestCudaAutocast::test_autocast_custom_cast_inputs, test/test_cuda.py::TestCudaAutocast::test_autocast_custom_deprecated_warning, test/test_cuda.py::TestCudaAutocast::test_autocast_custom_enabled, test/test_cuda.py::TestCudaAutocast::test_autocast_ignored_types, test/test_cuda.py::TestCudaAutocast::test_autocast_linalg_fp16, test/test_cuda.py::TestCudaAutocast::test_autocast_methods_expect_builtin_promote, test/test_cuda.py::TestCudaAutocast::test_autocast_methods_fp16, test/test_cuda.py::TestCudaAutocast::test_autocast_methods_fp32, test/test_cuda.py::TestCudaAutocast::test_autocast_nn_bf16, test/test_cuda.py::TestCudaAutocast::test_autocast_nn_fp16, test/test_cuda.py::TestCudaAutocast::test_autocast_nn_fp32, test/test_cuda.py::TestCudaAutocast::test_autocast_rnn, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_bf16, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_expect_builtin_promote, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_fp16, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_fp32, test/test_cuda.py::TestCudaAutocast::test_autocast_torch_need_autocast_promote, test/test_cuda.py::TestCudaAutocast::test_cuda_autocast_deprecated_warning, test/test_cuda.py::TestCompileKernel::test_compile_kernel, test/test_cuda.py::TestCompileKernel::test_compile_kernel_advanced, test/test_cuda.py::TestCompileKernel::test_compile_kernel_as_custom_op, test/test_cuda.py::TestCompileKernel::test_compile_kernel_cuda_headers, test/test_cuda.py::TestCompileKernel::test_compile_kernel_custom_op_validation, test/test_cuda.py::TestCompileKernel::test_compile_kernel_dlpack, test/test_cuda.py::TestCompileKernel::test_compile_kernel_double_precision, test/test_cuda.py::TestCompileKernel::test_compile_kernel_large_shared_memory, test/test_cuda.py::TestCompileKernel::test_compile_kernel_template, test/test_cuda.py::TestFXMemoryProfiler::test_fx_memory_profiler_augmentation, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_Adagrad_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_SGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_ASGD_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adadelta_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adamax_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_NAdam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_RAdam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_RMSprop_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Rprop_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_AdamW_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_Adam_cuda_float32, test/test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_SGD_cuda_float32, test/test_cuda.py::TestCudaDeviceParametrizedCUDA::test_graph_external_wait_and_record_cuda 2025-12-04T13:44:40.0957469Z 2025-12-04T13:44:40.0958265Z test_cuda.py::TestCuda::test_arithmetic_large_tensor SKIPPED [0.0003s] (was disabled due to not enough memory, but actually it always fail) [ 0%] 2025-12-04T13:44:40.0959503Z test_cuda.py::TestCuda::test_batch_norm_gather_stats PASSED [0.0946s] [ 0%] 2025-12-04T13:44:40.0960403Z test_cuda.py::TestCuda::test_bincount_ext PASSED [0.0414s] [ 1%] 2025-12-04T13:44:40.0961226Z test_cuda.py::TestCuda::test_caching_allocator_record_stream_oom PASSED [0.2270s] [ 1%] 2025-12-04T13:44:40.0962103Z test_cuda.py::TestCuda::test_caching_pinned_memory PASSED [0.9978s] [ 1%] 2025-12-04T13:44:40.0963129Z test_cuda.py::TestCuda::test_check_error PASSED [0.0015s] [ 2%] 2025-12-04T13:44:40.0963971Z test_cuda.py::TestCuda::test_copy_non_blocking PASSED [0.0544s] [ 2%] 2025-12-04T13:44:40.0964986Z test_cuda.py::TestCuda::test_copy_non_blocking_type_conversion PASSED [0.1007s] [ 3%] 2025-12-04T13:44:40.0966134Z test_cuda.py::TestCuda::test_cublas_allow_bf16_reduced_precision_reduction_get_set PASSED [0.0019s] [ 3%] 2025-12-04T13:44:40.0967254Z test_cuda.py::TestCuda::test_cublas_allow_fp16_accumulation_get_set PASSED [0.0019s] [ 3%] 2025-12-04T13:44:40.0968330Z test_cuda.py::TestCuda::test_cublas_allow_fp16_reduced_precision_reduction_get_set PASSED [0.0014s] [ 4%] 2025-12-04T13:44:40.0969451Z test_cuda.py::TestCuda::test_cublas_allow_tf32_get_set PASSED [0.0013s] [ 4%] 2025-12-04T13:44:40.0970441Z test_cuda.py::TestCuda::test_cublas_multiple_threads_same_device PASSED [0.1636s] [ 5%] 2025-12-04T13:44:40.0971515Z test_cuda.py::TestCuda::test_cublas_workspace_explicit_allocation PASSED [0.0045s] [ 5%] 2025-12-04T13:44:40.0972533Z test_cuda.py::TestCuda::test_cuda_get_device_capability PASSED [0.0015s] [ 5%] 2025-12-04T13:44:40.0973475Z test_cuda.py::TestCuda::test_cuda_get_device_name PASSED [0.0015s] [ 6%] 2025-12-04T13:44:40.0974379Z test_cuda.py::TestCuda::test_cuda_get_device_properties PASSED [0.0015s] [ 6%] 2025-12-04T13:44:40.0975500Z test_cuda.py::TestCuda::test_cuda_graph_allocator_propagates_stream PASSED [0.0035s] [ 7%] 2025-12-04T13:44:40.0976530Z test_cuda.py::TestCuda::test_cuda_graph_error_options PASSED [0.0180s] [ 7%] 2025-12-04T13:44:40.0977453Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph PASSED [0.0468s] [ 7%] 2025-12-04T13:44:40.0978570Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph_exec_keep_graph_False PASSED [0.0025s] [ 8%] 2025-12-04T13:44:40.0979760Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph_exec_keep_graph_True PASSED [0.0025s] [ 8%] 2025-12-04T13:44:40.0980866Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph_keep_graph_false PASSED [0.0027s] [ 9%] 2025-12-04T13:44:40.0981951Z test_cuda.py::TestCuda::test_cuda_graph_raw_graph_reset_and_recapture PASSED [0.0027s] [ 9%] 2025-12-04T13:44:40.0983096Z test_cuda.py::TestCuda::test_cuda_graph_tensor_item_not_allowed Traceback (most recent call last): 2025-12-04T13:44:40.0983949Z File "", line 17, in 2025-12-04T13:44:40.0984445Z File "", line 7, in my_func 2025-12-04T13:44:40.0985225Z torch.AcceleratorError: CUDA error: operation not permitted when stream is capturing 2025-12-04T13:44:40.0986788Z Search for `cudaErrorStreamCaptureUnsupported' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T13:44:40.0988490Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T13:44:40.0989558Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T13:44:40.0990324Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T13:44:40.0990819Z 2025-12-04T13:44:40.0990826Z 2025-12-04T13:44:40.0991155Z During handling of the above exception, another exception occurred: 2025-12-04T13:44:40.0991666Z 2025-12-04T13:44:40.0991836Z Traceback (most recent call last): 2025-12-04T13:44:40.0992300Z File "", line 16, in 2025-12-04T13:44:40.0993192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/graphs.py", line 267, in __exit__ 2025-12-04T13:44:40.0994090Z self.cuda_graph.capture_end() 2025-12-04T13:44:40.0994982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/graphs.py", line 129, in capture_end 2025-12-04T13:44:40.0995952Z super().capture_end() 2025-12-04T13:44:40.0996696Z torch.AcceleratorError: CUDA error: operation failed due to a previous error during capture 2025-12-04T13:44:40.0998138Z Search for `cudaErrorStreamCaptureInvalidated' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T13:44:40.0999839Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T13:44:40.1000880Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T13:44:40.1001558Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T13:44:40.1002020Z 2025-12-04T13:44:40.1002153Z PASSED [1.8930s] [ 9%] 2025-12-04T13:44:40.1002723Z test_cuda.py::TestCuda::test_cuda_kernel_loop_overflow PASSED [0.0334s] [ 10%] 2025-12-04T13:44:40.1003587Z test_cuda.py::TestCuda::test_cuda_kernel_loop_overflow_large PASSED [0.0771s] [ 10%] 2025-12-04T13:44:40.1004542Z test_cuda.py::TestCuda::test_cuda_memory_leak_detection_propagates_errors PASSED [0.0021s] [ 11%] 2025-12-04T13:44:40.1005528Z test_cuda.py::TestCuda::test_cuda_stream_protocol PASSED [0.0015s] [ 11%] 2025-12-04T13:44:40.1006354Z test_cuda.py::TestCuda::test_cudart_register PASSED [0.0019s] [ 11%] 2025-12-04T13:44:40.1007168Z test_cuda.py::TestCuda::test_cudnn_allow_tf32_get_set PASSED [0.0014s] [ 12%] 2025-12-04T13:44:40.1008235Z test_cuda.py::TestCuda::test_cudnn_multiple_threads_same_device PASSED [2.7467s] [ 12%] 2025-12-04T13:44:40.1009222Z test_cuda.py::TestCuda::test_cusparse_multiple_threads_same_device PASSED [36.3461s] [ 13%] 2025-12-04T13:44:40.1010171Z test_cuda.py::TestCuda::test_device_context_manager PASSED [0.0017s] [ 13%] 2025-12-04T13:44:40.1011356Z test_cuda.py::TestCuda::test_device_count_not_cached_pre_init SKIPPED [0.0002s] (requires multiple devices) [ 13%] 2025-12-04T13:44:40.1012472Z test_cuda.py::TestCuda::test_events PASSED [0.0511s] [ 14%] 2025-12-04T13:44:40.1013449Z test_cuda.py::TestCuda::test_events_elapsedtime PASSED [0.0017s] [ 14%] 2025-12-04T13:44:40.1015159Z test_cuda.py::TestCuda::test_fixed_cuda_assert_async /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/TensorCompare.cu:109: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `` failed. 2025-12-04T13:44:40.1016694Z Traceback (most recent call last): 2025-12-04T13:44:40.1017144Z File "", line 4, in 2025-12-04T13:44:40.1017969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1108, in synchronize 2025-12-04T13:44:40.1018857Z return torch._C._cuda_synchronize() 2025-12-04T13:44:40.1019559Z torch.AcceleratorError: CUDA error: device-side assert triggered 2025-12-04T13:44:40.1020707Z Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T13:44:40.1022186Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T13:44:40.1023264Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T13:44:40.1023943Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T13:44:40.1024454Z 2025-12-04T13:44:40.1025399Z /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/TensorCompare.cu:109: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `` failed. 2025-12-04T13:44:40.1026750Z Traceback (most recent call last): 2025-12-04T13:44:40.1027164Z File "", line 4, in 2025-12-04T13:44:40.1027951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1108, in synchronize 2025-12-04T13:44:40.1028703Z return torch._C._cuda_synchronize() 2025-12-04T13:44:40.1029290Z torch.AcceleratorError: CUDA error: device-side assert triggered 2025-12-04T13:44:40.1030288Z Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T13:44:40.1031442Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T13:44:40.1032125Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T13:44:40.1032584Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T13:44:40.1034055Z 2025-12-04T13:44:40.1034686Z /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/TensorCompare.cu:109: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `` failed. 2025-12-04T13:44:40.1035656Z Traceback (most recent call last): 2025-12-04T13:44:40.1036035Z File "", line 4, in 2025-12-04T13:44:40.1036621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1108, in synchronize 2025-12-04T13:44:40.1037221Z return torch._C._cuda_synchronize() 2025-12-04T13:44:40.1037636Z torch.AcceleratorError: CUDA error: device-side assert triggered 2025-12-04T13:44:40.1038419Z Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T13:44:40.1039394Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T13:44:40.1040077Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T13:44:40.1040548Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T13:44:40.1040859Z 2025-12-04T13:44:40.1041501Z /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/TensorCompare.cu:113: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `` failed. 2025-12-04T13:44:40.1042345Z Traceback (most recent call last): 2025-12-04T13:44:40.1042696Z File "", line 4, in 2025-12-04T13:44:40.1043290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/__init__.py", line 1108, in synchronize 2025-12-04T13:44:40.1043887Z return torch._C._cuda_synchronize() 2025-12-04T13:44:40.1044384Z torch.AcceleratorError: CUDA error: device-side assert triggered 2025-12-04T13:44:40.1045215Z Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T13:44:40.1046194Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T13:44:40.1046878Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T13:44:40.1047346Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T13:44:40.1047665Z 2025-12-04T13:44:40.1047762Z PASSED [7.5865s] [ 15%] 2025-12-04T13:44:40.1048190Z test_cuda.py::TestCuda::test_float32_matmul_precision_get_set PASSED [0.0018s] [ 15%] 2025-12-04T13:44:40.1048882Z test_cuda.py::TestCuda::test_fp32_precision_with_float32_matmul_precision PASSED [0.0015s] [ 15%] 2025-12-04T13:44:40.1049547Z test_cuda.py::TestCuda::test_fp32_precision_with_tf32 PASSED [0.0016s] [ 16%] 2025-12-04T13:44:40.1050121Z test_cuda.py::TestCuda::test_gather_bool PASSED [0.0133s] [ 16%] 2025-12-04T13:44:40.1050674Z test_cuda.py::TestCuda::test_gds_fails_in_ci PASSED [0.8428s] [ 17%] 2025-12-04T13:44:40.1051244Z test_cuda.py::TestCuda::test_generic_stream_event PASSED [0.0031s] [ 17%] 2025-12-04T13:44:40.1051814Z test_cuda.py::TestCuda::test_get_device_index PASSED [0.0015s] [ 17%] 2025-12-04T13:44:40.1052411Z test_cuda.py::TestCuda::test_get_per_process_memory_fraction PASSED [0.0016s] [ 18%] 2025-12-04T13:44:40.1052995Z test_cuda.py::TestCuda::test_graph_capture_oom PASSED [0.3562s] [ 18%] 2025-12-04T13:44:40.1053593Z test_cuda.py::TestCuda::test_graph_capture_reset_recapture PASSED [0.0031s] [ 19%] 2025-12-04T13:44:40.1054179Z test_cuda.py::TestCuda::test_graph_capture_simple PASSED [0.0025s] [ 19%] 2025-12-04T13:44:40.1054783Z test_cuda.py::TestCuda::test_graph_checkpoint_preserve_rng_state PASSED [0.0089s] [ 19%] 2025-12-04T13:44:40.1055450Z test_cuda.py::TestCuda::test_graph_concurrent_replay PASSED [0.0143s] [ 20%] 2025-12-04T13:44:40.1056112Z test_cuda.py::TestCuda::test_graph_cudnn_dropout PASSED [0.0518s] [ 20%] 2025-12-04T13:44:40.1056847Z test_cuda.py::TestCuda::test_graph_debugdump PASSED [0.1549s] [ 21%] 2025-12-04T13:44:40.1057684Z test_cuda.py::TestCuda::test_graph_error PASSED [1.9977s] [ 21%] 2025-12-04T13:44:40.1058496Z test_cuda.py::TestCuda::test_graph_is_current_stream_capturing PASSED [0.1336s] [ 21%] 2025-12-04T13:44:40.1059653Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_with_amp_cache_disabled_allow_unused_input PASSED [0.2867s] [ 22%] 2025-12-04T13:44:40.1060846Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_with_amp_cache_enabled_allow_unused_input XFAIL [0.1578s] [ 22%] 2025-12-04T13:44:40.1061976Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_without_amp_allow_unused_input PASSED [0.2925s] [ 23%] 2025-12-04T13:44:40.1063090Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_parameterless_nograd_module_without_amp_not_allow_unused_input PASSED [0.2906s] [ 23%] 2025-12-04T13:44:40.1064293Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_same_pool SKIPPED [0.0004s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 23%] 2025-12-04T13:44:40.1065728Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_with_amp_cache_enabled_allow_unused_input XFAIL [0.3487s] [ 24%] 2025-12-04T13:44:40.1067012Z test_cuda.py::TestCuda::test_graph_make_graphed_callables_without_amp_not_allow_unused_input XFAIL [0.3263s] [ 24%] 2025-12-04T13:44:40.1068267Z test_cuda.py::TestCuda::test_graph_manual_seed_mismatch_raises PASSED [0.1424s] [ 25%] 2025-12-04T13:44:40.1069389Z test_cuda.py::TestCuda::test_graph_memory_stats_and_use_result_after_destroy_graph PASSED [1.5978s] [ 25%] 2025-12-04T13:44:40.1070638Z test_cuda.py::TestCuda::test_graph_optims_with_explicitly_capturable_param_groups PASSED [0.3633s] [ 25%] 2025-12-04T13:44:40.1071782Z test_cuda.py::TestCuda::test_graph_record_stream PASSED [0.1627s] [ 26%] 2025-12-04T13:44:40.1072681Z test_cuda.py::TestCuda::test_graph_rng_distributions PASSED [0.2154s] [ 26%] 2025-12-04T13:44:40.1073540Z test_cuda.py::TestCuda::test_graph_rng_functional PASSED [0.1421s] [ 26%] 2025-12-04T13:44:40.1074399Z test_cuda.py::TestCuda::test_graph_three_successive PASSED [0.1377s] [ 27%] 2025-12-04T13:44:40.1075337Z test_cuda.py::TestCuda::test_graph_timing PASSED [0.1339s] [ 27%] 2025-12-04T13:44:40.1076210Z test_cuda.py::TestCuda::test_graph_two_successive PASSED [0.1426s] [ 28%] 2025-12-04T13:44:40.1077114Z test_cuda.py::TestCuda::test_graph_warn_if_has_zero_nodes PASSED [0.1333s] [ 28%] 2025-12-04T13:44:40.1078025Z test_cuda.py::TestCuda::test_graphsafe_set_get_rng_state PASSED [0.1364s] [ 28%] 2025-12-04T13:44:40.1079067Z test_cuda.py::TestCuda::test_hip_device_count SKIPPED [0.0002s] (not relevant for CUDA testing) [ 29%] 2025-12-04T13:44:40.1081889Z test_cuda.py::TestCuda::test_host_memory_stats SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/148607 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 29%] 2025-12-04T13:44:40.1084835Z test_cuda.py::TestCuda::test_huge_index SKIPPED [0.1327s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 30%] 2025-12-04T13:44:40.1086446Z test_cuda.py::TestCuda::test_index_out_of_bounds_exception_cuda SKIPPED [0.1324s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 30%] 2025-12-04T13:44:40.1089572Z test_cuda.py::TestCuda::test_invalid_status_for_legacy_api SKIPPED [0.0006s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157110 for platform(s) linux, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 30%] 2025-12-04T13:44:40.1092262Z test_cuda.py::TestCuda::test_is_pinned_no_context PASSED [2.0462s] [ 31%] 2025-12-04T13:44:40.1093251Z test_cuda.py::TestCuda::test_lazy_init PASSED [3.4449s] [ 31%] 2025-12-04T13:44:40.1094121Z test_cuda.py::TestCuda::test_manual_seed PASSED [0.1355s] [ 32%] 2025-12-04T13:44:40.1095085Z test_cuda.py::TestCuda::test_matmul_device_mismatch PASSED [0.1354s] [ 32%] 2025-12-04T13:44:40.1096018Z test_cuda.py::TestCuda::test_matmul_memory_use PASSED [0.1385s] [ 32%] 2025-12-04T13:44:40.1097195Z test_cuda.py::TestCuda::test_max_large_axis SKIPPED [0.1329s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 33%] 2025-12-04T13:44:40.1098359Z test_cuda.py::TestCuda::test_mean_fp16 PASSED [0.1397s] [ 33%] 2025-12-04T13:44:40.1099300Z test_cuda.py::TestCuda::test_memory_allocation PASSED [0.2639s] [ 34%] 2025-12-04T13:44:40.1100193Z test_cuda.py::TestCuda::test_memory_stats PASSED [0.3966s] [ 34%] 2025-12-04T13:44:40.1101214Z test_cuda.py::TestCuda::test_memory_stats_of_multiple_generators_and_graphs PASSED [0.5354s] [ 34%] 2025-12-04T13:44:40.1102244Z test_cuda.py::TestCuda::test_min_max_inits PASSED [0.1334s] [ 35%] 2025-12-04T13:44:40.1103298Z test_cuda.py::TestCuda::test_multi_device_context_manager SKIPPED [0.0002s] (only one GPU detected) [ 35%] 2025-12-04T13:44:40.1104611Z test_cuda.py::TestCuda::test_multi_device_stream_context_manager SKIPPED [0.0002s] (only one GPU detected) [ 36%] 2025-12-04T13:44:40.1105882Z test_cuda.py::TestCuda::test_multinomial_ext PASSED [0.1714s] [ 36%] 2025-12-04T13:44:40.1107176Z test_cuda.py::TestCuda::test_multinomial_invalid_probs_cuda SKIPPED [0.1337s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 36%] 2025-12-04T13:44:40.1109142Z test_cuda.py::TestCuda::test_noncontiguous_pinned_memory PASSED [0.1327s] [ 37%] 2025-12-04T13:44:40.1109747Z test_cuda.py::TestCuda::test_norm_type_conversion PASSED [0.1532s] [ 37%] 2025-12-04T13:44:40.1110314Z test_cuda.py::TestCuda::test_nvtx PASSED [0.1326s] [ 38%] 2025-12-04T13:44:40.1110872Z test_cuda.py::TestCuda::test_out_of_memory PASSED [0.1340s] [ 38%] 2025-12-04T13:44:40.1111415Z test_cuda.py::TestCuda::test_out_of_memory_retry PASSED [0.1431s] [ 38%] 2025-12-04T13:44:40.1111979Z test_cuda.py::TestCuda::test_pinned_memory_empty_cache PASSED [0.1651s] [ 39%] 2025-12-04T13:44:40.1112596Z test_cuda.py::TestCuda::test_pinned_memory_use_background_threads PASSED [2.0022s] [ 39%] 2025-12-04T13:44:40.1113239Z test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister PASSED [0.1719s] [ 40%] 2025-12-04T13:44:40.1113920Z test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister_multithread PASSED [0.2822s] [ 40%] 2025-12-04T13:44:40.1114600Z test_cuda.py::TestCuda::test_preferred_blas_library_settings PASSED [3.4513s] [ 40%] 2025-12-04T13:44:40.1115196Z test_cuda.py::TestCuda::test_prod_large PASSED [4.0793s] [ 41%] 2025-12-04T13:44:40.1115792Z test_cuda.py::TestCuda::test_randint_generation_for_large_numel PASSED [0.3513s] [ 41%] 2025-12-04T13:44:40.1116438Z test_cuda.py::TestCuda::test_randint_randomness_for_large_range PASSED [0.2230s] [ 42%] 2025-12-04T13:44:40.1117096Z test_cuda.py::TestCuda::test_random_no_reused_random_states_float32 PASSED [0.6347s] [ 42%] 2025-12-04T13:44:40.1117756Z test_cuda.py::TestCuda::test_random_no_reused_random_states_float64 PASSED [0.6922s] [ 42%] 2025-12-04T13:44:40.1118367Z test_cuda.py::TestCuda::test_record_stream PASSED [0.1862s] [ 43%] 2025-12-04T13:44:40.1118998Z test_cuda.py::TestCuda::test_record_stream_on_shifted_view Command took >60min, returning 124 2025-12-04T13:44:40.1119487Z Got exit code 124 2025-12-04T13:44:40.1119704Z Retrying single test... 2025-12-04T13:44:40.1120197Z Test results will be stored in test-reports/python-pytest/test_cuda/test_cuda-4cb2e826acd2b876.xml 2025-12-04T13:44:40.1120789Z ============================= test session starts ============================== 2025-12-04T13:44:40.1121342Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T13:44:40.1122081Z cachedir: .pytest_cache 2025-12-04T13:44:40.1122694Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:44:40.1123430Z rootdir: /var/lib/jenkins/workspace 2025-12-04T13:44:40.1123720Z configfile: pytest.ini 2025-12-04T13:44:40.1124342Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:44:40.1125140Z collecting ... collected 252 items / 251 deselected / 1 selected 2025-12-04T13:44:40.1125887Z stepcurrent: skipping 109 already run items. Running only test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view 2025-12-04T13:44:40.1126520Z Running 1 items in this shard 2025-12-04T13:44:40.1126705Z 2025-12-04T13:44:40.1126999Z test_cuda.py::TestCuda::test_record_stream_on_shifted_view Command took >60min, returning 124 2025-12-04T13:44:40.1127496Z Got exit code 124 2025-12-04T13:44:40.1127719Z Retrying single test... 2025-12-04T13:44:40.1128201Z Test results will be stored in test-reports/python-pytest/test_cuda/test_cuda-7396cb3929bf8579.xml 2025-12-04T13:44:40.1128789Z ============================= test session starts ============================== 2025-12-04T13:44:40.1129345Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T13:44:40.1129840Z cachedir: .pytest_cache 2025-12-04T13:44:40.1130507Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:44:40.1131183Z rootdir: /var/lib/jenkins/workspace 2025-12-04T13:44:40.1131524Z configfile: pytest.ini 2025-12-04T13:44:40.1132137Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:44:40.1132903Z collecting ... collected 252 items / 251 deselected / 1 selected 2025-12-04T13:44:40.1133628Z stepcurrent: skipping 109 already run items. Running only test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view 2025-12-04T13:44:40.1134253Z Running 1 items in this shard 2025-12-04T13:44:40.1134438Z 2025-12-04T13:44:40.1134730Z test_cuda.py::TestCuda::test_record_stream_on_shifted_view Command took >60min, returning 124 2025-12-04T13:44:40.1135254Z Got exit code 124 2025-12-04T13:44:40.1135676Z FAILED CONSISTENTLY: test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view 2025-12-04T13:44:40.1136418Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T13:44:40.1137225Z Test results will be stored in test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.xml 2025-12-04T13:44:40.1137815Z ============================= test session starts ============================== 2025-12-04T13:44:40.1138359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T13:44:40.1138858Z cachedir: .pytest_cache 2025-12-04T13:44:40.1139585Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:44:40.1140244Z rootdir: /var/lib/jenkins/workspace 2025-12-04T13:44:40.1140533Z configfile: pytest.ini 2025-12-04T13:44:40.1141140Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:44:40.1141907Z collecting ... collected 252 items / 110 deselected / 142 selected 2025-12-04T13:44:40.1142328Z stepcurrent: skipping 110 already run items. 2025-12-04T13:44:40.1142653Z Running 142 items in this shard 2025-12-04T13:44:40.1142832Z 2025-12-04T13:44:40.1143091Z test_cuda.py::TestCuda::test_reduction_gpu_memory_accessing PASSED [0.0280s] [ 0%] 2025-12-04T13:44:40.1143780Z test_cuda.py::TestCuda::test_repeat_graph_capture_cublas_workspace_memory PASSED [0.1344s] [ 1%] 2025-12-04T13:44:40.1144494Z test_cuda.py::TestCuda::test_rocm_backward_pass_guard SKIPPED [0.0003s] (ROCm-only test) [ 2%] 2025-12-04T13:44:40.1145232Z test_cuda.py::TestCuda::test_serialization_array_with_empty PASSED [0.0616s] [ 2%] 2025-12-04T13:44:40.1145910Z test_cuda.py::TestCuda::test_serialization_array_with_storage PASSED [0.0045s] [ 3%] 2025-12-04T13:44:40.1146532Z test_cuda.py::TestCuda::test_set_per_process_memory_fraction PASSED [0.0139s] [ 4%] 2025-12-04T13:44:40.1147141Z test_cuda.py::TestCuda::test_specify_improper_device_name PASSED [0.0024s] [ 4%] 2025-12-04T13:44:40.1147723Z test_cuda.py::TestCuda::test_stream_compatibility PASSED [0.0022s] [ 5%] 2025-12-04T13:44:40.1148295Z test_cuda.py::TestCuda::test_stream_context_manager PASSED [0.0015s] [ 6%] 2025-12-04T13:44:40.1148854Z test_cuda.py::TestCuda::test_stream_event_repr PASSED [0.0013s] [ 7%] 2025-12-04T13:44:40.1149433Z test_cuda.py::TestCuda::test_streaming_backwards_callback PASSED [0.0103s] [ 7%] 2025-12-04T13:44:40.1150067Z test_cuda.py::TestCuda::test_streaming_backwards_multiple_streams PASSED [0.0610s] [ 8%] 2025-12-04T13:44:40.1150694Z test_cuda.py::TestCuda::test_streaming_backwards_sync PASSED [0.0109s] [ 9%] 2025-12-04T13:44:40.1151315Z test_cuda.py::TestCuda::test_streaming_backwards_sync_graph_root PASSED [0.2600s] [ 9%] 2025-12-04T13:44:40.1161001Z test_cuda.py::TestCuda::test_streams PASSED [0.0019s] [ 10%] 2025-12-04T13:44:40.1161652Z test_cuda.py::TestCuda::test_sum_fp16 PASSED [0.0350s] [ 11%] 2025-12-04T13:44:40.1162209Z test_cuda.py::TestCuda::test_tiny_half_norm_ PASSED [0.0296s] [ 11%] 2025-12-04T13:44:40.1162773Z test_cuda.py::TestCuda::test_to_cpu_blocking_by_default PASSED [0.1097s] [ 12%] 2025-12-04T13:44:40.1163386Z test_cuda.py::TestCuda::test_to_non_blocking PASSED [0.4351s] [ 13%] 2025-12-04T13:44:40.1163947Z test_cuda.py::TestCuda::test_to_numpy PASSED [0.0019s] [ 14%] 2025-12-04T13:44:40.1164547Z test_cuda.py::TestCuda::test_torch_manual_seed_seeds_cuda_devices PASSED [0.0027s] [ 14%] 2025-12-04T13:44:40.1165191Z test_cuda.py::TestCuda::test_type_conversions PASSED [0.0025s] [ 15%] 2025-12-04T13:44:40.1165775Z test_cuda.py::TestCuda::test_uuid PASSED [0.0013s] [ 16%] 2025-12-04T13:44:40.1166362Z test_cuda.py::TestCudaMallocAsync::test_allocator_backend PASSED [1.6906s] [ 16%] 2025-12-04T13:44:40.1166991Z test_cuda.py::TestCudaMallocAsync::test_allocator_fuzz PASSED [1.2409s] [ 17%] 2025-12-04T13:44:40.1167688Z test_cuda.py::TestCudaMallocAsync::test_allocator_memory_fraction_setting PASSED [8.4148s] [ 18%] 2025-12-04T13:44:40.1168395Z test_cuda.py::TestCudaMallocAsync::test_allocator_settings PASSED [0.0060s] [ 19%] 2025-12-04T13:44:40.1169066Z test_cuda.py::TestCudaMallocAsync::test_cachingAllocator_raw_alloc PASSED [0.0108s] [ 19%] 2025-12-04T13:44:40.1169852Z test_cuda.py::TestCudaMallocAsync::test_clock_speed SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 20%] 2025-12-04T13:44:40.1170643Z test_cuda.py::TestCudaMallocAsync::test_cpp_memory_snapshot_pickle PASSED [12.2423s] [ 21%] 2025-12-04T13:44:40.1171651Z test_cuda.py::TestCudaMallocAsync::test_cycles W1204 13:44:07.887000 110856 site-packages/torch/utils/viz/_cycles.py:59] CUDA Memory changed during GC, 512 bytes freed. 2025-12-04T13:44:40.1172440Z PASSED [0.3256s] [ 21%] 2025-12-04T13:44:40.1173025Z test_cuda.py::TestCudaMallocAsync::test_device_memory_used SKIPPED [0.0003s] (pynvml/amdsmi is not available) [ 22%] 2025-12-04T13:44:40.1173802Z test_cuda.py::TestCudaMallocAsync::test_direct_traceback PASSED [0.0019s] [ 23%] 2025-12-04T13:44:40.1174457Z test_cuda.py::TestCudaMallocAsync::test_garbage_collect_expandable PASSED [0.0061s] [ 23%] 2025-12-04T13:44:40.1175167Z test_cuda.py::TestCudaMallocAsync::test_max_split_expandable PASSED [0.0092s] [ 24%] 2025-12-04T13:44:40.1175857Z test_cuda.py::TestCudaMallocAsync::test_memory_compile_regions PASSED [3.2056s] [ 25%] 2025-12-04T13:44:40.1176498Z test_cuda.py::TestCudaMallocAsync::test_memory_plots PASSED [0.0659s] [ 26%] 2025-12-04T13:44:40.1177219Z test_cuda.py::TestCudaMallocAsync::test_memory_plots_free_segment_stack PASSED [0.0051s] [ 26%] 2025-12-04T13:44:40.1177932Z test_cuda.py::TestCudaMallocAsync::test_memory_plots_free_stack PASSED [0.0048s] [ 27%] 2025-12-04T13:44:40.1178675Z test_cuda.py::TestCudaMallocAsync::test_memory_plots_history_context PASSED [0.0017s] [ 28%] 2025-12-04T13:44:40.1179462Z test_cuda.py::TestCudaMallocAsync::test_memory_plots_metadata PASSED [0.0023s] [ 28%] 2025-12-04T13:44:40.1180122Z test_cuda.py::TestCudaMallocAsync::test_memory_profiler_viz PASSED [0.0452s] [ 29%] 2025-12-04T13:44:40.1181959Z test_cuda.py::TestCudaMallocAsync::test_memory_snapshot SKIPPED [0.0007s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/126953 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 30%] 2025-12-04T13:44:40.1183784Z test_cuda.py::TestCudaMallocAsync::test_memory_snapshot_script PASSED [0.0034s] [ 30%] 2025-12-04T13:44:40.1185762Z test_cuda.py::TestCudaMallocAsync::test_memory_snapshot_with_cpp SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/137249 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 31%] 2025-12-04T13:44:40.1187578Z test_cuda.py::TestCudaMallocAsync::test_notifies_oom PASSED [0.0149s] [ 32%] 2025-12-04T13:44:40.1188368Z test_cuda.py::TestCudaMallocAsync::test_nvml_get_handler SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 33%] 2025-12-04T13:44:40.1189242Z test_cuda.py::TestCudaMallocAsync::test_power_draw SKIPPED [0.0003s] (pynvml/amdsmi is not available) [ 33%] 2025-12-04T13:44:40.1190075Z test_cuda.py::TestCudaMallocAsync::test_raises_oom_max_split_size_mb_setting_False PASSED [0.0146s] [ 34%] 2025-12-04T13:44:40.1190895Z test_cuda.py::TestCudaMallocAsync::test_raises_oom_max_split_size_mb_setting_True PASSED [0.0152s] [ 35%] 2025-12-04T13:44:40.1191770Z test_cuda.py::TestCudaMallocAsync::test_raw_amdsmi_device_count SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 35%] 2025-12-04T13:44:40.1192737Z test_cuda.py::TestCudaMallocAsync::test_raw_amdsmi_device_uuids SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 36%] 2025-12-04T13:44:40.1193655Z test_cuda.py::TestCudaMallocAsync::test_temperature SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 37%] 2025-12-04T13:44:40.1194570Z test_cuda.py::TestCudaMallocAsync::test_uuid_visible_devices SKIPPED [0.0002s] (pynvml/amdsmi is not available) [ 38%] 2025-12-04T13:44:40.1195471Z test_cuda.py::TestBlockStateAbsorption::test_additional_free_following_checkpoint PASSED [0.2267s] [ 38%] 2025-12-04T13:44:40.1196285Z test_cuda.py::TestBlockStateAbsorption::test_allocate_in_thread_to_pool PASSED [0.3993s] [ 39%] 2025-12-04T13:44:40.1197074Z test_cuda.py::TestBlockStateAbsorption::test_allocated_in_middle_of_segment PASSED [0.1978s] [ 40%] 2025-12-04T13:44:40.1197907Z test_cuda.py::TestBlockStateAbsorption::test_assigning_back_deleter_fns_to_tensor PASSED [0.2191s] [ 40%] 2025-12-04T13:44:40.1198710Z test_cuda.py::TestBlockStateAbsorption::test_check_pool_live_allocations PASSED [0.1971s] [ 41%] 2025-12-04T13:44:40.1199501Z test_cuda.py::TestBlockStateAbsorption::test_middle_allocations_contiguous PASSED [0.1975s] [ 42%] 2025-12-04T13:44:40.1200287Z test_cuda.py::TestBlockStateAbsorption::test_multiple_middle_allocations PASSED [0.1983s] [ 42%] 2025-12-04T13:44:40.1201022Z test_cuda.py::TestBlockStateAbsorption::test_no_triton_on_import PASSED [2.0633s] [ 43%] 2025-12-04T13:44:40.1201693Z test_cuda.py::TestBlockStateAbsorption::test_resnet PASSED [0.6759s] [ 44%] 2025-12-04T13:44:40.1202329Z test_cuda.py::TestBlockStateAbsorption::test_simple PASSED [0.2007s] [ 45%] 2025-12-04T13:44:40.1203082Z test_cuda.py::TestBlockStateAbsorption::test_tensor_dies_after_checkpoint PASSED [0.1988s] [ 45%] 2025-12-04T13:44:40.1203791Z test_cuda.py::TestMemPool::test_graph_capture_reclaim_2_streams PASSED [0.0024s] [ 46%] 2025-12-04T13:44:40.1204497Z test_cuda.py::TestMemPool::test_graph_capture_reclaim_4_streams PASSED [0.0025s] [ 47%] 2025-12-04T13:44:40.1206366Z test_cuda.py::TestMemPool::test_mempool_ctx_multithread SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/153460 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 47%] 2025-12-04T13:44:40.1208465Z test_cuda.py::TestMemPool::test_mempool_empty_cache PASSED [0.0019s] [ 48%] 2025-12-04T13:44:40.1210236Z test_cuda.py::TestMemPool::test_mempool_empty_cache_inactive SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/159663 for platform(s) linux, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 49%] 2025-12-04T13:44:40.1212029Z test_cuda.py::TestMemPool::test_mempool_emptycache_multithread PASSED [0.0035s] [ 50%] 2025-12-04T13:44:40.1214676Z test_cuda.py::TestMemPool::test_mempool_expandable [1/2] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=dummy_allocator -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -isystem /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /opt/conda/envs/py_3.10/include/python3.10 -fPIC -std=c++17 -c /var/lib/jenkins/.cache/torch_extensions/py310_cu128/dummy_allocator/main.cpp -o main.o 2025-12-04T13:44:40.1217795Z [2/2] c++ main.o -shared -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o dummy_allocator.so 2025-12-04T13:44:40.1218766Z PASSED [2.0904s] [ 50%] 2025-12-04T13:44:40.1219230Z test_cuda.py::TestMemPool::test_mempool_id PASSED [0.0012s] [ 51%] 2025-12-04T13:44:40.1221045Z test_cuda.py::TestMemPool::test_mempool_limited_memory_with_allocator SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157256 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 52%] 2025-12-04T13:44:40.1222843Z test_cuda.py::TestMemPool::test_mempool_multithread PASSED [0.0018s] [ 52%] 2025-12-04T13:44:40.1224583Z test_cuda.py::TestMemPool::test_mempool_with_allocator SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/154566 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 53%] 2025-12-04T13:44:40.1226319Z test_cuda.py::TestMemPool::test_nested_mempool PASSED [0.0026s] [ 54%] 2025-12-04T13:44:40.1227068Z test_cuda.py::TestGDS::test_gds_read_write_tensors SKIPPED [0.0002s] (Disabling as USE_CUFILE=0 by default in builds) [ 54%] 2025-12-04T13:44:40.1227826Z test_cuda.py::TestCudaAutocast::test_autocast_banned PASSED [0.0080s] [ 55%] 2025-12-04T13:44:40.1228512Z test_cuda.py::TestCudaAutocast::test_autocast_cache_leak PASSED [0.1341s] [ 56%] 2025-12-04T13:44:40.1229334Z test_cuda.py::TestCudaAutocast::test_autocast_cat_jit PASSED [0.0072s] [ 57%] 2025-12-04T13:44:40.1230041Z test_cuda.py::TestCudaAutocast::test_autocast_checkpointing PASSED [0.0095s] [ 57%] 2025-12-04T13:44:40.1230701Z test_cuda.py::TestCudaAutocast::test_autocast_custom_cast_inputs PASSED [0.0079s] [ 58%] 2025-12-04T13:44:40.1231513Z test_cuda.py::TestCudaAutocast::test_autocast_custom_deprecated_warning PASSED [0.0039s] [ 59%] 2025-12-04T13:44:40.1232220Z test_cuda.py::TestCudaAutocast::test_autocast_custom_enabled PASSED [0.1212s] [ 59%] 2025-12-04T13:44:40.1232925Z test_cuda.py::TestCudaAutocast::test_autocast_ignored_types PASSED [0.0534s] [ 60%] 2025-12-04T13:44:40.1233558Z test_cuda.py::TestCudaAutocast::test_autocast_linalg_fp16 PASSED [0.0040s] [ 61%] 2025-12-04T13:44:40.1234260Z test_cuda.py::TestCudaAutocast::test_autocast_methods_expect_builtin_promote PASSED [0.0047s] [ 61%] 2025-12-04T13:44:40.1234973Z test_cuda.py::TestCudaAutocast::test_autocast_methods_fp16 PASSED [0.0032s] [ 62%] 2025-12-04T13:44:40.1235667Z test_cuda.py::TestCudaAutocast::test_autocast_methods_fp32 PASSED [0.0035s] [ 63%] 2025-12-04T13:44:40.1236280Z test_cuda.py::TestCudaAutocast::test_autocast_nn_bf16 PASSED [0.0033s] [ 64%] 2025-12-04T13:44:40.1236887Z test_cuda.py::TestCudaAutocast::test_autocast_nn_fp16 PASSED [0.0032s] [ 64%] 2025-12-04T13:44:40.1237495Z test_cuda.py::TestCudaAutocast::test_autocast_nn_fp32 PASSED [0.0548s] [ 65%] 2025-12-04T13:44:40.1238093Z test_cuda.py::TestCudaAutocast::test_autocast_rnn PASSED [10.2264s] [ 66%] 2025-12-04T13:44:40.1238709Z test_cuda.py::TestCudaAutocast::test_autocast_torch_bf16 PASSED [0.0653s] [ 66%] 2025-12-04T13:44:40.1239448Z test_cuda.py::TestCudaAutocast::test_autocast_torch_expect_builtin_promote PASSED [0.0056s] [ 67%] 2025-12-04T13:44:40.1240139Z test_cuda.py::TestCudaAutocast::test_autocast_torch_fp16 PASSED [0.0625s] [ 68%] 2025-12-04T13:44:40.1240744Z test_cuda.py::TestCudaAutocast::test_autocast_torch_fp32 PASSED [0.8243s] [ 69%] 2025-12-04T13:44:40.1241475Z test_cuda.py::TestCudaAutocast::test_autocast_torch_need_autocast_promote PASSED [0.0468s] [ 69%] 2025-12-04T13:44:40.1242211Z test_cuda.py::TestCudaAutocast::test_cuda_autocast_deprecated_warning PASSED [0.0097s] [ 70%] 2025-12-04T13:44:40.1242876Z test_cuda.py::TestCompileKernel::test_compile_kernel PASSED [0.1258s] [ 71%] 2025-12-04T13:44:40.1243512Z test_cuda.py::TestCompileKernel::test_compile_kernel_advanced PASSED [0.1881s] [ 71%] 2025-12-04T13:44:40.1244193Z test_cuda.py::TestCompileKernel::test_compile_kernel_as_custom_op PASSED [0.0376s] [ 72%] 2025-12-04T13:44:40.1244879Z test_cuda.py::TestCompileKernel::test_compile_kernel_cuda_headers PASSED [0.0447s] [ 73%] 2025-12-04T13:44:40.1245599Z test_cuda.py::TestCompileKernel::test_compile_kernel_custom_op_validation PASSED [0.1875s] [ 73%] 2025-12-04T13:44:40.1246295Z test_cuda.py::TestCompileKernel::test_compile_kernel_dlpack PASSED [0.0298s] [ 74%] 2025-12-04T13:44:40.1246982Z test_cuda.py::TestCompileKernel::test_compile_kernel_double_precision PASSED [0.0351s] [ 75%] 2025-12-04T13:44:40.1247733Z test_cuda.py::TestCompileKernel::test_compile_kernel_large_shared_memory PASSED [0.0417s] [ 76%] 2025-12-04T13:44:40.1248431Z test_cuda.py::TestCompileKernel::test_compile_kernel_template PASSED [0.0709s] [ 76%] 2025-12-04T13:44:40.1249136Z test_cuda.py::TestFXMemoryProfiler::test_fx_memory_profiler_augmentation PASSED [0.2687s] [ 77%] 2025-12-04T13:44:40.1250056Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_AdamW_cuda_float32 PASSED [0.2208s] [ 78%] 2025-12-04T13:44:40.1251135Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_Adam_cuda_float32 PASSED [0.0038s] [ 78%] 2025-12-04T13:44:40.1252203Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_SGD_cuda_float32 PASSED [0.0030s] [ 79%] 2025-12-04T13:44:40.1253276Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_AdamW_cuda_float32 PASSED [0.0030s] [ 80%] 2025-12-04T13:44:40.1254432Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_Adam_cuda_float32 PASSED [0.0028s] [ 80%] 2025-12-04T13:44:40.1255540Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_SGD_cuda_float32 PASSED [0.0027s] [ 81%] 2025-12-04T13:44:40.1256854Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_Adagrad_cuda_float32 SKIPPED [0.0013s] (cuda is not supported for fused on Adagrad) [ 82%] 2025-12-04T13:44:40.1258013Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32 PASSED [0.9212s] [ 83%] 2025-12-04T13:44:40.1258969Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_Adam_cuda_float32 PASSED [0.8974s] [ 83%] 2025-12-04T13:44:40.1259988Z test_cuda.py::TestCudaOptimsCUDA::test_grad_scaling_autocast_fused_optimizers_SGD_cuda_float32 PASSED [0.4747s] [ 84%] 2025-12-04T13:44:40.1260957Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_AdamW_cuda_float32 PASSED [0.0091s] [ 85%] 2025-12-04T13:44:40.1261950Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_Adam_cuda_float32 PASSED [0.0064s] [ 85%] 2025-12-04T13:44:40.1262946Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_False_SGD_cuda_float32 PASSED [0.0062s] [ 86%] 2025-12-04T13:44:40.1263931Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_AdamW_cuda_float32 PASSED [0.0066s] [ 87%] 2025-12-04T13:44:40.1264906Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_Adam_cuda_float32 PASSED [0.0066s] [ 88%] 2025-12-04T13:44:40.1265973Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_False_fused_True_SGD_cuda_float32 PASSED [0.0064s] [ 88%] 2025-12-04T13:44:40.1266951Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_AdamW_cuda_float32 PASSED [0.0363s] [ 89%] 2025-12-04T13:44:40.1267983Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_Adam_cuda_float32 PASSED [0.0071s] [ 90%] 2025-12-04T13:44:40.1268944Z test_cuda.py::TestCudaOptimsCUDA::test_graph_grad_scaling_foreach_True_fused_False_SGD_cuda_float32 PASSED [0.0061s] [ 90%] 2025-12-04T13:44:40.1269804Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_ASGD_cuda_float32 PASSED [0.3956s] [ 91%] 2025-12-04T13:44:40.1270555Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adadelta_cuda_float32 PASSED [0.2056s] [ 92%] 2025-12-04T13:44:40.1271308Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_AdamW_cuda_float32 PASSED [0.3401s] [ 92%] 2025-12-04T13:44:40.1272030Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adam_cuda_float32 PASSED [0.3369s] [ 93%] 2025-12-04T13:44:40.1272762Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Adamax_cuda_float32 PASSED [0.2239s] [ 94%] 2025-12-04T13:44:40.1273507Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_NAdam_cuda_float32 PASSED [0.3925s] [ 95%] 2025-12-04T13:44:40.1274241Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_RAdam_cuda_float32 PASSED [0.3729s] [ 95%] 2025-12-04T13:44:40.1274975Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_RMSprop_cuda_float32 PASSED [0.2100s] [ 96%] 2025-12-04T13:44:40.1275719Z test_cuda.py::TestCudaOptimsCUDA::test_graph_optims_Rprop_cuda_float32 PASSED [0.3199s] [ 97%] 2025-12-04T13:44:40.1276538Z test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_AdamW_cuda_float32 PASSED [0.1081s] [ 97%] 2025-12-04T13:44:40.1277423Z test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_Adam_cuda_float32 PASSED [0.1071s] [ 98%] 2025-12-04T13:44:40.1278297Z test_cuda.py::TestCudaOptimsCUDA::test_graph_scaling_fused_optimizers_SGD_cuda_float32 PASSED [0.0444s] [ 99%] 2025-12-04T13:44:40.1279195Z test_cuda.py::TestCudaDeviceParametrizedCUDA::test_graph_external_wait_and_record_cuda PASSED [1.0415s] [100%] 2025-12-04T13:44:40.1279714Z 2025-12-04T13:44:40.1280196Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.xml - 2025-12-04T13:44:40.1280972Z =============== 125 passed, 17 skipped, 110 deselected in 55.06s =============== 2025-12-04T13:44:40.1281680Z The following tests failed consistently: ['test/test_cuda.py::TestCuda::test_record_stream_on_shifted_view'] 2025-12-04T13:44:40.1282167Z 2025-12-04T13:44:40.1282525Z FINISHED PRINTING LOG FILE of test_cuda 1/1 (test/test-reports/test_cuda_1.1_5ed6ed395e86485d_.log) 2025-12-04T13:44:40.1282958Z 2025-12-04T13:44:40.1283178Z Finished test_cuda 1/1 ... [2025-12-04 13:44:40.074490][16352.0837123], took 181.29min 2025-12-04T13:44:40.1284033Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.xml 2025-12-04T13:44:40.6792493Z Uploading logs for 57118183212 to S3 2025-12-04T13:44:40.8626392Z Uploading artifacts took 0.67 seconds 2025-12-04T13:44:40.8626803Z test_cuda 1/1 failed! 2025-12-04T13:44:40.8630793Z Running test_sparse 1/1 ... [2025-12-04 13:44:40.862766][16352.871989054] 2025-12-04T13:44:40.8631367Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:44:40.8635176Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:44:40.863158] 2025-12-04T14:00:07.5784017Z 2025-12-04T14:00:07.5784932Z PRINTING LOG FILE of test_sparse 1/1 (test/test-reports/test_sparse_1.1_e217f60a40d48402_.log) 2025-12-04T14:00:07.5786081Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.xml 2025-12-04T14:00:07.5786699Z ============================= test session starts ============================== 2025-12-04T14:00:07.5787424Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T14:00:07.5788213Z cachedir: .pytest_cache 2025-12-04T14:00:07.5788827Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:00:07.5789504Z rootdir: /var/lib/jenkins/workspace 2025-12-04T14:00:07.5789791Z configfile: pytest.ini 2025-12-04T14:00:07.5790412Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:00:07.5791094Z collecting ... collected 3100 items 2025-12-04T14:00:07.5791447Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T14:00:07.6946896Z Running 3100 items in this shard: test/test_sparse.py::TestSparseLegacyAndDeprecation::test_legacy_warnings, test/test_sparse.py::TestSparseOneOff::test_cuda_from_cpu, test/test_sparse.py::TestSparseOneOff::test_cuda_sparse_cpu_dense_add, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSR_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bool, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex128, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bool, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex128, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_float64, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_any_cuda, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_assign_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_basic_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_basic_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_basic_ops_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_deterministic_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_oob_cuda, test/test_sparse.py::TestSparseCUDA::test_bmm_windows_error_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_cat_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_cat_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_clone_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_clone_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_accepts_large_tensor_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_reference_cycle_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_transpose_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_contig_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_contig_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_is_coalesced_with_gradcheck_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_large_sizes_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_cuda_empty_cuda, test/test_sparse.py::TestSparseCUDA::test_div_by_sparse_error_cuda, test/test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_dsmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_dtypes_cuda, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_empty_like_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_like_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_copy_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_device_type_inference_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_empty_indices_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_floor_divide_by_sparse_error_cuda, test/test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_hsmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_is_nonzero_cuda, test/test_sparse.py::TestSparseCUDA::test_is_sparse_cuda, test/test_sparse.py::TestSparseCUDA::test_isnan_cuda, test/test_sparse.py::TestSparseCUDA::test_legacy_new_cuda, test/test_sparse.py::TestSparseCUDA::test_legacy_new_device_cuda, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_log_softmax_float_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_mm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_mv_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_narrow_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_narrow_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_negative_indices_cuda, test/test_sparse.py::TestSparseCUDA::test_new_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_new_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_new_device_multi_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_new_device_single_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_norm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_norm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_pickle_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_print_coalesced_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_print_uncoalesced_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_resize_as_cuda, test/test_sparse.py::TestSparseCUDA::test_resize_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_resize_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_saddmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_saddmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_same_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_scalar_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_scalar_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_select_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_select_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_shared_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_shared_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_small_nnz_coalesced_cuda, test/test_sparse.py::TestSparseCUDA::test_softmax_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_spadd_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_out_bfloat16_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_sum_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_to_numpy_cuda, test/test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_storage_not_null_cuda, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_t_empty_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_t_empty_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_transpose_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_transpose_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_zeros_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_zeros_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_generate_simple_inputs_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_invalid_blocksize_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_Strided_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_Strided_cuda 2025-12-04T14:00:07.8105443Z 2025-12-04T14:00:07.8105833Z test_sparse.py::TestSparseLegacyAndDeprecation::test_legacy_warnings PASSED [0.0225s] [ 0%] 2025-12-04T14:00:07.8106556Z test_sparse.py::TestSparseOneOff::test_cuda_from_cpu PASSED [0.0259s] [ 0%] 2025-12-04T14:00:07.8107205Z test_sparse.py::TestSparseOneOff::test_cuda_sparse_cpu_dense_add PASSED [0.0020s] [ 0%] 2025-12-04T14:00:07.8108057Z test_sparse.py::TestSparseMeta::test_add_meta_SparseBSC_float64 PASSED [0.0655s] [ 0%] 2025-12-04T14:00:07.8108751Z test_sparse.py::TestSparseMeta::test_add_meta_SparseBSR_float64 PASSED [0.0628s] [ 0%] 2025-12-04T14:00:07.8109440Z test_sparse.py::TestSparseMeta::test_add_meta_SparseCOO_float64 PASSED [0.0418s] [ 0%] 2025-12-04T14:00:07.8110382Z test_sparse.py::TestSparseMeta::test_add_meta_SparseCSC_float64 PASSED [0.0579s] [ 0%] 2025-12-04T14:00:07.8111043Z test_sparse.py::TestSparseMeta::test_add_meta_SparseCSR_float64 PASSED [0.0579s] [ 0%] 2025-12-04T14:00:07.8111702Z test_sparse.py::TestSparseMeta::test_fake_SparseBSC_float64 PASSED [0.3204s] [ 0%] 2025-12-04T14:00:07.8112439Z test_sparse.py::TestSparseMeta::test_fake_SparseBSR_float64 PASSED [0.3130s] [ 0%] 2025-12-04T14:00:07.8113074Z test_sparse.py::TestSparseMeta::test_fake_SparseCOO_float64 PASSED [0.1960s] [ 0%] 2025-12-04T14:00:07.8113715Z test_sparse.py::TestSparseMeta::test_fake_SparseCSC_float64 PASSED [0.3046s] [ 0%] 2025-12-04T14:00:07.8114347Z test_sparse.py::TestSparseMeta::test_fake_SparseCSR_float64 PASSED [0.3060s] [ 0%] 2025-12-04T14:00:07.8114984Z test_sparse.py::TestSparseMeta::test_meta_SparseBSC_float64 PASSED [0.0027s] [ 0%] 2025-12-04T14:00:07.8115610Z test_sparse.py::TestSparseMeta::test_meta_SparseBSR_float64 PASSED [0.0024s] [ 0%] 2025-12-04T14:00:07.8116241Z test_sparse.py::TestSparseMeta::test_meta_SparseCOO_float64 PASSED [0.0016s] [ 0%] 2025-12-04T14:00:07.8116873Z test_sparse.py::TestSparseMeta::test_meta_SparseCSC_float64 PASSED [0.0023s] [ 0%] 2025-12-04T14:00:07.8117505Z test_sparse.py::TestSparseMeta::test_meta_SparseCSR_float64 PASSED [0.0023s] [ 0%] 2025-12-04T14:00:07.8118172Z test_sparse.py::TestSparseMeta::test_print_meta_SparseBSC_float64 PASSED [0.0020s] [ 0%] 2025-12-04T14:00:07.8118855Z test_sparse.py::TestSparseMeta::test_print_meta_SparseBSR_float64 PASSED [0.0014s] [ 0%] 2025-12-04T14:00:07.8119531Z test_sparse.py::TestSparseMeta::test_print_meta_SparseCOO_float64 PASSED [0.0013s] [ 0%] 2025-12-04T14:00:07.8120210Z test_sparse.py::TestSparseMeta::test_print_meta_SparseCSC_float64 PASSED [0.0013s] [ 0%] 2025-12-04T14:00:07.8120892Z test_sparse.py::TestSparseMeta::test_print_meta_SparseCSR_float64 PASSED [0.0013s] [ 0%] 2025-12-04T14:00:07.8121562Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSC_float64 PASSED [0.0302s] [ 0%] 2025-12-04T14:00:07.8122225Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSR_float64 PASSED [0.0298s] [ 0%] 2025-12-04T14:00:07.8123003Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseCOO_float64 PASSED [0.0272s] [ 0%] 2025-12-04T14:00:07.8123661Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSC_float64 PASSED [0.0279s] [ 0%] 2025-12-04T14:00:07.8124319Z test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSR_float64 PASSED [0.0278s] [ 0%] 2025-12-04T14:00:07.8124991Z test_sparse.py::TestSparseMeta::test_to_meta_SparseBSC_float64 PASSED [0.0540s] [ 0%] 2025-12-04T14:00:07.8125779Z test_sparse.py::TestSparseMeta::test_to_meta_SparseBSR_float64 PASSED [0.0537s] [ 0%] 2025-12-04T14:00:07.8126433Z test_sparse.py::TestSparseMeta::test_to_meta_SparseCOO_float64 PASSED [0.0407s] [ 1%] 2025-12-04T14:00:07.8127169Z test_sparse.py::TestSparseMeta::test_to_meta_SparseCSC_float64 PASSED [0.0495s] [ 1%] 2025-12-04T14:00:07.8127839Z test_sparse.py::TestSparseMeta::test_to_meta_SparseCSR_float64 PASSED [0.0503s] [ 1%] 2025-12-04T14:00:07.8128532Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSC_float64 PASSED [0.0799s] [ 1%] 2025-12-04T14:00:07.8129257Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSR_float64 PASSED [0.0795s] [ 1%] 2025-12-04T14:00:07.8129981Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCOO_float64 PASSED [0.0762s] [ 1%] 2025-12-04T14:00:07.8130702Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSC_float64 PASSED [0.0752s] [ 1%] 2025-12-04T14:00:07.8131428Z test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSR_float64 PASSED [0.0754s] [ 1%] 2025-12-04T14:00:07.8132166Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSC_float64 PASSED [0.0540s] [ 1%] 2025-12-04T14:00:07.8132888Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSR_float64 PASSED [0.0539s] [ 1%] 2025-12-04T14:00:07.8134000Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCOO_float64 PASSED [0.0414s] [ 1%] 2025-12-04T14:00:07.8135045Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSC_float64 PASSED [0.0500s] [ 1%] 2025-12-04T14:00:07.8137250Z test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSR_float64 PASSED [0.0500s] [ 1%] 2025-12-04T14:00:07.8138449Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex128 SKIPPED [0.0002s] (In-place abs not supported for complex tensors) [ 1%] 2025-12-04T14:00:07.8139742Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex64 SKIPPED [0.0002s] (In-place abs not supported for complex tensors) [ 1%] 2025-12-04T14:00:07.8140738Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float32 PASSED [0.1651s] [ 1%] 2025-12-04T14:00:07.8141521Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float64 PASSED [0.1708s] [ 1%] 2025-12-04T14:00:07.8142298Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int16 PASSED [0.0120s] [ 1%] 2025-12-04T14:00:07.8143074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int32 PASSED [0.1643s] [ 1%] 2025-12-04T14:00:07.8143846Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int64 PASSED [0.0072s] [ 1%] 2025-12-04T14:00:07.8144600Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int8 PASSED [0.1636s] [ 1%] 2025-12-04T14:00:07.8145375Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_uint8 PASSED [0.0073s] [ 1%] 2025-12-04T14:00:07.8146168Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex128 PASSED [0.1871s] [ 1%] 2025-12-04T14:00:07.8146994Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex64 PASSED [0.0094s] [ 1%] 2025-12-04T14:00:07.8147794Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float32 PASSED [0.1639s] [ 1%] 2025-12-04T14:00:07.8148586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float64 PASSED [0.0074s] [ 1%] 2025-12-04T14:00:07.8149377Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int16 PASSED [0.1635s] [ 1%] 2025-12-04T14:00:07.8150165Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int32 PASSED [0.0070s] [ 1%] 2025-12-04T14:00:07.8150935Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int64 PASSED [0.1642s] [ 1%] 2025-12-04T14:00:07.8151720Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int8 PASSED [0.0070s] [ 1%] 2025-12-04T14:00:07.8152498Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_uint8 PASSED [0.0062s] [ 1%] 2025-12-04T14:00:07.8153350Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex128 PASSED [0.1709s] [ 2%] 2025-12-04T14:00:07.8154172Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex64 PASSED [0.0077s] [ 2%] 2025-12-04T14:00:07.8155024Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float32 PASSED [0.1639s] [ 2%] 2025-12-04T14:00:07.8155827Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float64 PASSED [0.0073s] [ 2%] 2025-12-04T14:00:07.8156612Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int16 PASSED [0.1637s] [ 2%] 2025-12-04T14:00:07.8157391Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int32 PASSED [0.0070s] [ 2%] 2025-12-04T14:00:07.8158172Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int64 PASSED [0.1635s] [ 2%] 2025-12-04T14:00:07.8158944Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int8 PASSED [0.0070s] [ 2%] 2025-12-04T14:00:07.8159721Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_uint8 PASSED [0.1639s] [ 2%] 2025-12-04T14:00:07.8160523Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex128 PASSED [0.5190s] [ 2%] 2025-12-04T14:00:07.8161332Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex64 PASSED [0.8664s] [ 2%] 2025-12-04T14:00:07.8162173Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float32 PASSED [0.0079s] [ 2%] 2025-12-04T14:00:07.8162957Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float64 PASSED [0.1641s] [ 2%] 2025-12-04T14:00:07.8163735Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int16 PASSED [0.0071s] [ 2%] 2025-12-04T14:00:07.8164550Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int32 PASSED [0.1635s] [ 2%] 2025-12-04T14:00:07.8165316Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int64 PASSED [0.0070s] [ 2%] 2025-12-04T14:00:07.8166085Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int8 PASSED [0.1626s] [ 2%] 2025-12-04T14:00:07.8166854Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_uint8 PASSED [0.0070s] [ 2%] 2025-12-04T14:00:07.8167665Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex128 PASSED [0.6595s] [ 2%] 2025-12-04T14:00:07.8168480Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex64 PASSED [0.7028s] [ 2%] 2025-12-04T14:00:07.8169291Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float32 PASSED [0.1653s] [ 2%] 2025-12-04T14:00:07.8170084Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float64 PASSED [0.0074s] [ 2%] 2025-12-04T14:00:07.8170880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int16 PASSED [0.1640s] [ 2%] 2025-12-04T14:00:07.8171659Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int32 PASSED [0.0070s] [ 2%] 2025-12-04T14:00:07.8172440Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int64 PASSED [0.1638s] [ 2%] 2025-12-04T14:00:07.8173229Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int8 PASSED [0.0070s] [ 2%] 2025-12-04T14:00:07.8174018Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_uint8 PASSED [0.0062s] [ 2%] 2025-12-04T14:00:07.8174800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float32 PASSED [0.1700s] [ 2%] 2025-12-04T14:00:07.8175593Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float64 PASSED [0.0073s] [ 2%] 2025-12-04T14:00:07.8176374Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int16 PASSED [0.1639s] [ 2%] 2025-12-04T14:00:07.8177150Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int32 PASSED [0.0071s] [ 2%] 2025-12-04T14:00:07.8177934Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int64 PASSED [0.1639s] [ 3%] 2025-12-04T14:00:07.8178708Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int8 PASSED [0.0071s] [ 3%] 2025-12-04T14:00:07.8179584Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_uint8 PASSED [0.1639s] [ 3%] 2025-12-04T14:00:07.8180551Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex128 SKIPPED [0.0028s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8181638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8182705Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float32 SKIPPED [0.0026s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8183758Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8184792Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int16 SKIPPED [0.0025s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8185826Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8186862Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int64 SKIPPED [0.0028s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8187890Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8188955Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 3%] 2025-12-04T14:00:07.8190134Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex128 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8191523Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex64 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8192859Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float32 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8194183Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float64 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8195471Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int16 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8196760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int32 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8198049Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int64 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8199400Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int8 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8200687Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_uint8 SKIPPED [0.0002s] (Skipped! conj_physical_ not implemented for sparse) [ 3%] 2025-12-04T14:00:07.8201764Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float32 PASSED [0.1647s] [ 3%] 2025-12-04T14:00:07.8202608Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float64 PASSED [0.0073s] [ 3%] 2025-12-04T14:00:07.8203428Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int16 PASSED [0.1640s] [ 3%] 2025-12-04T14:00:07.8204237Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int32 PASSED [0.0070s] [ 3%] 2025-12-04T14:00:07.8205056Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int64 PASSED [0.1639s] [ 3%] 2025-12-04T14:00:07.8205869Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int8 PASSED [0.0070s] [ 3%] 2025-12-04T14:00:07.8206716Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_uint8 PASSED [0.1638s] [ 3%] 2025-12-04T14:00:07.8207517Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float32 PASSED [0.0103s] [ 3%] 2025-12-04T14:00:07.8208687Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float64 PASSED [0.1648s] [ 3%] 2025-12-04T14:00:07.8209491Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int16 PASSED [0.0070s] [ 3%] 2025-12-04T14:00:07.8210300Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int32 PASSED [0.1639s] [ 4%] 2025-12-04T14:00:07.8211094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int64 PASSED [0.0070s] [ 4%] 2025-12-04T14:00:07.8211881Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int8 PASSED [0.1634s] [ 4%] 2025-12-04T14:00:07.8212658Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_uint8 PASSED [0.0070s] [ 4%] 2025-12-04T14:00:07.8213448Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float32 PASSED [0.1644s] [ 4%] 2025-12-04T14:00:07.8214268Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float64 PASSED [0.2153s] [ 4%] 2025-12-04T14:00:07.8215085Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int16 PASSED [0.3799s] [ 4%] 2025-12-04T14:00:07.8215900Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int32 PASSED [0.0070s] [ 4%] 2025-12-04T14:00:07.8216764Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int64 PASSED [0.1638s] [ 4%] 2025-12-04T14:00:07.8217583Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int8 PASSED [0.0070s] [ 4%] 2025-12-04T14:00:07.8218485Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_uint8 PASSED [0.0062s] [ 4%] 2025-12-04T14:00:07.8219383Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex128 PASSED [0.1772s] [ 4%] 2025-12-04T14:00:07.8220209Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex64 PASSED [0.0075s] [ 4%] 2025-12-04T14:00:07.8221029Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float32 PASSED [0.1639s] [ 4%] 2025-12-04T14:00:07.8221863Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float64 PASSED [0.0074s] [ 4%] 2025-12-04T14:00:07.8222646Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int16 PASSED [0.1640s] [ 4%] 2025-12-04T14:00:07.8223464Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int32 PASSED [0.0071s] [ 4%] 2025-12-04T14:00:07.8224262Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int64 PASSED [0.1635s] [ 4%] 2025-12-04T14:00:07.8225077Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int8 PASSED [0.0070s] [ 4%] 2025-12-04T14:00:07.8225875Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_uint8 PASSED [0.1646s] [ 4%] 2025-12-04T14:00:07.8226674Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float32 PASSED [0.0074s] [ 4%] 2025-12-04T14:00:07.8227488Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float64 PASSED [0.1639s] [ 4%] 2025-12-04T14:00:07.8228302Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int16 PASSED [0.0071s] [ 4%] 2025-12-04T14:00:07.8229093Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int32 PASSED [0.1633s] [ 4%] 2025-12-04T14:00:07.8229904Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int64 PASSED [0.0071s] [ 4%] 2025-12-04T14:00:07.8230684Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int8 PASSED [0.1637s] [ 4%] 2025-12-04T14:00:07.8231469Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_uint8 PASSED [0.0070s] [ 4%] 2025-12-04T14:00:07.8232281Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float32 PASSED [0.1642s] [ 4%] 2025-12-04T14:00:07.8233095Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float64 PASSED [0.0074s] [ 4%] 2025-12-04T14:00:07.8234123Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex128 SKIPPED [0.0026s] (Skipped! Out not supported) [ 4%] 2025-12-04T14:00:07.8235252Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex64 SKIPPED [0.0028s] (Skipped! Out not supported) [ 4%] 2025-12-04T14:00:07.8236339Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8237412Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8238505Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int16 SKIPPED [0.0028s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8239588Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8240638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8241682Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8242742Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8243865Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex128 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8244964Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8246080Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8247148Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8248216Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8249265Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8250338Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8251389Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8252445Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_uint8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8253514Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8254613Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8255706Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8256789Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int32 SKIPPED [0.0029s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8257865Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8259000Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8260132Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8261213Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8262361Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8263484Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8264570Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8265652Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8266730Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8267811Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [ 5%] 2025-12-04T14:00:07.8268812Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex128 PASSED [0.1708s] [ 5%] 2025-12-04T14:00:07.8269664Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex64 PASSED [0.0075s] [ 6%] 2025-12-04T14:00:07.8270485Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float32 PASSED [0.1646s] [ 6%] 2025-12-04T14:00:07.8271351Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float64 PASSED [0.0073s] [ 6%] 2025-12-04T14:00:07.8272175Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int16 PASSED [0.1644s] [ 6%] 2025-12-04T14:00:07.8272972Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int32 PASSED [0.0070s] [ 6%] 2025-12-04T14:00:07.8273806Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int64 PASSED [0.1641s] [ 6%] 2025-12-04T14:00:07.8274604Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int8 PASSED [0.0070s] [ 6%] 2025-12-04T14:00:07.8275401Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_uint8 PASSED [0.1640s] [ 6%] 2025-12-04T14:00:07.8276233Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float32 PASSED [0.0074s] [ 6%] 2025-12-04T14:00:07.8277081Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float64 PASSED [0.1648s] [ 6%] 2025-12-04T14:00:07.8277927Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int16 PASSED [0.0070s] [ 6%] 2025-12-04T14:00:07.8278800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int32 PASSED [0.1643s] [ 6%] 2025-12-04T14:00:07.8279653Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int64 PASSED [0.0071s] [ 6%] 2025-12-04T14:00:07.8280470Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int8 PASSED [0.1644s] [ 6%] 2025-12-04T14:00:07.8281293Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_uint8 PASSED [0.0071s] [ 6%] 2025-12-04T14:00:07.8282124Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex128 PASSED [0.2274s] [ 6%] 2025-12-04T14:00:07.8282931Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex64 PASSED [0.0728s] [ 6%] 2025-12-04T14:00:07.8283735Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float32 PASSED [0.1730s] [ 6%] 2025-12-04T14:00:07.8284531Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float64 PASSED [0.0074s] [ 6%] 2025-12-04T14:00:07.8285335Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int16 PASSED [0.1661s] [ 6%] 2025-12-04T14:00:07.8286111Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int32 PASSED [0.0071s] [ 6%] 2025-12-04T14:00:07.8286896Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int64 PASSED [0.1642s] [ 6%] 2025-12-04T14:00:07.8287681Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int8 PASSED [0.0072s] [ 6%] 2025-12-04T14:00:07.8288530Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_uint8 PASSED [0.1644s] [ 6%] 2025-12-04T14:00:07.8289548Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 6%] 2025-12-04T14:00:07.8290737Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 6%] 2025-12-04T14:00:07.8291911Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 6%] 2025-12-04T14:00:07.8293078Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int32 SKIPPED [0.0027s] (Skipped! Out not supported) [ 6%] 2025-12-04T14:00:07.8294229Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 6%] 2025-12-04T14:00:07.8295387Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 6%] 2025-12-04T14:00:07.8296543Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [ 6%] 2025-12-04T14:00:07.8297703Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex128 SKIPPED [0.0025s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8298915Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8300075Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8301219Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8302309Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8303391Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8304472Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8305558Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8306639Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [ 7%] 2025-12-04T14:00:07.8307585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float32 PASSED [0.1643s] [ 7%] 2025-12-04T14:00:07.8308561Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float64 PASSED [0.0073s] [ 7%] 2025-12-04T14:00:07.8309423Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int16 PASSED [0.1641s] [ 7%] 2025-12-04T14:00:07.8310294Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int32 PASSED [0.0070s] [ 7%] 2025-12-04T14:00:07.8311152Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int64 PASSED [0.1645s] [ 7%] 2025-12-04T14:00:07.8312006Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int8 PASSED [0.0070s] [ 7%] 2025-12-04T14:00:07.8312877Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_uint8 PASSED [0.1644s] [ 7%] 2025-12-04T14:00:07.8313743Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float32 PASSED [0.0074s] [ 7%] 2025-12-04T14:00:07.8314586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float64 PASSED [0.1650s] [ 7%] 2025-12-04T14:00:07.8315442Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int16 PASSED [0.0070s] [ 7%] 2025-12-04T14:00:07.8316293Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int32 PASSED [0.1643s] [ 7%] 2025-12-04T14:00:07.8317166Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int64 PASSED [0.0071s] [ 7%] 2025-12-04T14:00:07.8318020Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int8 PASSED [0.1642s] [ 7%] 2025-12-04T14:00:07.8318836Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_uint8 PASSED [0.0070s] [ 7%] 2025-12-04T14:00:07.8319668Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex128 PASSED [0.2760s] [ 7%] 2025-12-04T14:00:07.8320492Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex64 PASSED [0.1459s] [ 7%] 2025-12-04T14:00:07.8321285Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float32 PASSED [0.1656s] [ 7%] 2025-12-04T14:00:07.8322098Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float64 PASSED [0.0073s] [ 7%] 2025-12-04T14:00:07.8322880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int16 PASSED [0.1650s] [ 7%] 2025-12-04T14:00:07.8323667Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int32 PASSED [0.0071s] [ 7%] 2025-12-04T14:00:07.8324444Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int64 PASSED [0.1647s] [ 7%] 2025-12-04T14:00:07.8325222Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int8 PASSED [0.0071s] [ 7%] 2025-12-04T14:00:07.8326097Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_uint8 PASSED [0.1649s] [ 8%] 2025-12-04T14:00:07.8326880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float32 PASSED [0.0074s] [ 8%] 2025-12-04T14:00:07.8327736Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float64 PASSED [0.1653s] [ 8%] 2025-12-04T14:00:07.8328544Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int16 PASSED [0.0071s] [ 8%] 2025-12-04T14:00:07.8329340Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int32 PASSED [0.1649s] [ 8%] 2025-12-04T14:00:07.8330131Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int64 PASSED [0.0071s] [ 8%] 2025-12-04T14:00:07.8330919Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int8 PASSED [0.1650s] [ 8%] 2025-12-04T14:00:07.8331708Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_uint8 PASSED [0.0070s] [ 8%] 2025-12-04T14:00:07.8332648Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float32 SKIPPED [0.0026s] (Skipped! Out not supported) [ 8%] 2025-12-04T14:00:07.8346786Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 8%] 2025-12-04T14:00:07.8347909Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 8%] 2025-12-04T14:00:07.8349005Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 8%] 2025-12-04T14:00:07.8350046Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int64 SKIPPED [0.0028s] (Skipped! Out not supported) [ 8%] 2025-12-04T14:00:07.8351080Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 8%] 2025-12-04T14:00:07.8352119Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 8%] 2025-12-04T14:00:07.8353040Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex128 PASSED [0.4415s] [ 8%] 2025-12-04T14:00:07.8353833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex64 PASSED [0.5103s] [ 8%] 2025-12-04T14:00:07.8354608Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float32 PASSED [0.1665s] [ 8%] 2025-12-04T14:00:07.8355376Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float64 PASSED [0.0073s] [ 8%] 2025-12-04T14:00:07.8356222Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int16 PASSED [0.1649s] [ 8%] 2025-12-04T14:00:07.8356973Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int32 PASSED [0.0070s] [ 8%] 2025-12-04T14:00:07.8357765Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int64 PASSED [0.1650s] [ 8%] 2025-12-04T14:00:07.8358538Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int8 PASSED [0.0070s] [ 8%] 2025-12-04T14:00:07.8359324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_uint8 PASSED [0.1649s] [ 8%] 2025-12-04T14:00:07.8360098Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex128 PASSED [0.2794s] [ 8%] 2025-12-04T14:00:07.8360895Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex64 PASSED [0.6642s] [ 8%] 2025-12-04T14:00:07.8361679Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float32 PASSED [0.0080s] [ 8%] 2025-12-04T14:00:07.8362459Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float64 PASSED [0.1654s] [ 8%] 2025-12-04T14:00:07.8363225Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int16 PASSED [0.0070s] [ 8%] 2025-12-04T14:00:07.8363987Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int32 PASSED [0.1651s] [ 8%] 2025-12-04T14:00:07.8364744Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int64 PASSED [0.0070s] [ 8%] 2025-12-04T14:00:07.8365541Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int8 PASSED [0.1652s] [ 9%] 2025-12-04T14:00:07.8366291Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_uint8 PASSED [0.0070s] [ 9%] 2025-12-04T14:00:07.8367179Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex128 PASSED [0.4947s] [ 9%] 2025-12-04T14:00:07.8367970Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex64 PASSED [0.6225s] [ 9%] 2025-12-04T14:00:07.8368784Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float32 PASSED [0.1657s] [ 9%] 2025-12-04T14:00:07.8369602Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float64 PASSED [0.0074s] [ 9%] 2025-12-04T14:00:07.8370507Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int16 PASSED [0.1653s] [ 9%] 2025-12-04T14:00:07.8371265Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int32 PASSED [0.0070s] [ 9%] 2025-12-04T14:00:07.8372035Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int64 PASSED [0.1648s] [ 9%] 2025-12-04T14:00:07.8372789Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int8 PASSED [0.0070s] [ 9%] 2025-12-04T14:00:07.8373545Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_uint8 PASSED [0.1651s] [ 9%] 2025-12-04T14:00:07.8374320Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex128 PASSED [0.0102s] [ 9%] 2025-12-04T14:00:07.8375102Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex64 PASSED [0.1660s] [ 9%] 2025-12-04T14:00:07.8375881Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float32 PASSED [0.0073s] [ 9%] 2025-12-04T14:00:07.8376651Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float64 PASSED [0.1653s] [ 9%] 2025-12-04T14:00:07.8377408Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int16 PASSED [0.0070s] [ 9%] 2025-12-04T14:00:07.8378157Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int32 PASSED [0.1651s] [ 9%] 2025-12-04T14:00:07.8378957Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int64 PASSED [0.0070s] [ 9%] 2025-12-04T14:00:07.8379798Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int8 PASSED [0.1652s] [ 9%] 2025-12-04T14:00:07.8380541Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_uint8 PASSED [0.0069s] [ 9%] 2025-12-04T14:00:07.8381310Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex128 PASSED [0.1689s] [ 9%] 2025-12-04T14:00:07.8382167Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex64 PASSED [0.0074s] [ 9%] 2025-12-04T14:00:07.8382987Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float32 PASSED [0.1655s] [ 9%] 2025-12-04T14:00:07.8383767Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float64 PASSED [0.0074s] [ 9%] 2025-12-04T14:00:07.8384529Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int16 PASSED [0.1656s] [ 9%] 2025-12-04T14:00:07.8385288Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int32 PASSED [0.0070s] [ 9%] 2025-12-04T14:00:07.8386055Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int64 PASSED [0.1653s] [ 9%] 2025-12-04T14:00:07.8386815Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int8 PASSED [0.0071s] [ 9%] 2025-12-04T14:00:07.8387564Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_uint8 PASSED [0.1653s] [ 9%] 2025-12-04T14:00:07.8388339Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float32 PASSED [0.0074s] [ 9%] 2025-12-04T14:00:07.8389183Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float64 PASSED [0.1657s] [ 9%] 2025-12-04T14:00:07.8389963Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int16 PASSED [0.0070s] [ 10%] 2025-12-04T14:00:07.8390778Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int32 PASSED [0.1651s] [ 10%] 2025-12-04T14:00:07.8391555Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int64 PASSED [0.0071s] [ 10%] 2025-12-04T14:00:07.8392317Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int8 PASSED [0.1655s] [ 10%] 2025-12-04T14:00:07.8393119Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_uint8 PASSED [0.0070s] [ 10%] 2025-12-04T14:00:07.8393879Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex128 PASSED [0.1664s] [ 10%] 2025-12-04T14:00:07.8394629Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex64 PASSED [0.0074s] [ 10%] 2025-12-04T14:00:07.8395366Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float32 PASSED [0.1657s] [ 10%] 2025-12-04T14:00:07.8396098Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float64 PASSED [0.0073s] [ 10%] 2025-12-04T14:00:07.8396828Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int16 PASSED [0.1655s] [ 10%] 2025-12-04T14:00:07.8397550Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int32 PASSED [0.0073s] [ 10%] 2025-12-04T14:00:07.8398260Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int64 PASSED [0.1678s] [ 10%] 2025-12-04T14:00:07.8398980Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int8 PASSED [0.0071s] [ 10%] 2025-12-04T14:00:07.8399692Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_uint8 PASSED [0.1660s] [ 10%] 2025-12-04T14:00:07.8400429Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex128 PASSED [0.0075s] [ 10%] 2025-12-04T14:00:07.8401208Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex64 PASSED [0.1655s] [ 10%] 2025-12-04T14:00:07.8401961Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float32 PASSED [0.0073s] [ 10%] 2025-12-04T14:00:07.8402700Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float64 PASSED [0.1658s] [ 10%] 2025-12-04T14:00:07.8403434Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int16 PASSED [0.0073s] [ 10%] 2025-12-04T14:00:07.8404155Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int32 PASSED [0.1652s] [ 10%] 2025-12-04T14:00:07.8404877Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int64 PASSED [0.0073s] [ 10%] 2025-12-04T14:00:07.8405601Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int8 PASSED [0.1662s] [ 10%] 2025-12-04T14:00:07.8406319Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_uint8 PASSED [0.0095s] [ 10%] 2025-12-04T14:00:07.8407142Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex128 PASSED [0.1662s] [ 10%] 2025-12-04T14:00:07.8408182Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex64 PASSED [0.0074s] [ 10%] 2025-12-04T14:00:07.8409119Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float32 PASSED [0.1658s] [ 10%] 2025-12-04T14:00:07.8409874Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float64 PASSED [0.0073s] [ 10%] 2025-12-04T14:00:07.8410655Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int16 PASSED [0.1660s] [ 10%] 2025-12-04T14:00:07.8411506Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int32 PASSED [0.0073s] [ 10%] 2025-12-04T14:00:07.8412350Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int64 PASSED [0.1659s] [ 10%] 2025-12-04T14:00:07.8413096Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int8 PASSED [0.0073s] [ 10%] 2025-12-04T14:00:07.8413848Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_uint8 PASSED [0.1661s] [ 11%] 2025-12-04T14:00:07.8414621Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex128 PASSED [0.0074s] [ 11%] 2025-12-04T14:00:07.8415409Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex64 PASSED [0.1660s] [ 11%] 2025-12-04T14:00:07.8416175Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float32 PASSED [0.0073s] [ 11%] 2025-12-04T14:00:07.8417027Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float64 PASSED [0.1664s] [ 11%] 2025-12-04T14:00:07.8417785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int16 PASSED [0.0073s] [ 11%] 2025-12-04T14:00:07.8418588Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int32 PASSED [0.1658s] [ 11%] 2025-12-04T14:00:07.8419460Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int64 PASSED [0.0073s] [ 11%] 2025-12-04T14:00:07.8420200Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int8 PASSED [0.1673s] [ 11%] 2025-12-04T14:00:07.8420947Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_uint8 PASSED [0.0073s] [ 11%] 2025-12-04T14:00:07.8421715Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex128 PASSED [0.1662s] [ 11%] 2025-12-04T14:00:07.8422505Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex64 PASSED [0.0074s] [ 11%] 2025-12-04T14:00:07.8423285Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float32 PASSED [0.1660s] [ 11%] 2025-12-04T14:00:07.8424051Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float64 PASSED [0.0073s] [ 11%] 2025-12-04T14:00:07.8424812Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int16 PASSED [0.1660s] [ 11%] 2025-12-04T14:00:07.8425565Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int32 PASSED [0.0072s] [ 11%] 2025-12-04T14:00:07.8426319Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int64 PASSED [0.1660s] [ 11%] 2025-12-04T14:00:07.8427068Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int8 PASSED [0.0073s] [ 11%] 2025-12-04T14:00:07.8427816Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_uint8 PASSED [0.0064s] [ 11%] 2025-12-04T14:00:07.8428574Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float32 PASSED [0.1664s] [ 11%] 2025-12-04T14:00:07.8429337Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float64 PASSED [0.0073s] [ 11%] 2025-12-04T14:00:07.8430195Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int16 PASSED [0.1663s] [ 11%] 2025-12-04T14:00:07.8430941Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int32 PASSED [0.0071s] [ 11%] 2025-12-04T14:00:07.8431688Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int64 PASSED [0.1661s] [ 11%] 2025-12-04T14:00:07.8432439Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int8 PASSED [0.0071s] [ 11%] 2025-12-04T14:00:07.8433256Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_uint8 PASSED [0.1662s] [ 11%] 2025-12-04T14:00:07.8434154Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex128 SKIPPED [0.0028s] (Skipped! Out not supported) [ 11%] 2025-12-04T14:00:07.8435250Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 11%] 2025-12-04T14:00:07.8436275Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float32 SKIPPED [0.0026s] (Skipped! Out not supported) [ 11%] 2025-12-04T14:00:07.8437283Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 11%] 2025-12-04T14:00:07.8438286Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 11%] 2025-12-04T14:00:07.8439324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int32 SKIPPED [0.0026s] (Skipped! Out not supported) [ 12%] 2025-12-04T14:00:07.8440315Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 12%] 2025-12-04T14:00:07.8441301Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 12%] 2025-12-04T14:00:07.8442288Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_uint8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 12%] 2025-12-04T14:00:07.8443258Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex128 PASSED [0.1673s] [ 12%] 2025-12-04T14:00:07.8444115Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex64 PASSED [0.0075s] [ 12%] 2025-12-04T14:00:07.8444994Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float32 PASSED [0.1672s] [ 12%] 2025-12-04T14:00:07.8445823Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float64 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8446641Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int16 PASSED [0.1662s] [ 12%] 2025-12-04T14:00:07.8447459Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int32 PASSED [0.0070s] [ 12%] 2025-12-04T14:00:07.8448266Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int64 PASSED [0.1660s] [ 12%] 2025-12-04T14:00:07.8449074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int8 PASSED [0.0071s] [ 12%] 2025-12-04T14:00:07.8449884Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_uint8 PASSED [0.1658s] [ 12%] 2025-12-04T14:00:07.8450682Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float32 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8451467Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float64 PASSED [0.1664s] [ 12%] 2025-12-04T14:00:07.8452241Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int16 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8453014Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int32 PASSED [0.1667s] [ 12%] 2025-12-04T14:00:07.8453776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int64 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8454540Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int8 PASSED [0.1662s] [ 12%] 2025-12-04T14:00:07.8455312Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_uint8 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8456083Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float32 PASSED [0.1668s] [ 12%] 2025-12-04T14:00:07.8456842Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float64 PASSED [0.0074s] [ 12%] 2025-12-04T14:00:07.8457607Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int16 PASSED [0.1667s] [ 12%] 2025-12-04T14:00:07.8458357Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int32 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8459168Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int64 PASSED [0.1663s] [ 12%] 2025-12-04T14:00:07.8459963Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int8 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8460716Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_uint8 PASSED [0.1665s] [ 12%] 2025-12-04T14:00:07.8461535Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float32 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8462324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float64 PASSED [0.1663s] [ 12%] 2025-12-04T14:00:07.8463111Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int16 PASSED [0.0073s] [ 12%] 2025-12-04T14:00:07.8463895Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int32 PASSED [0.1647s] [ 12%] 2025-12-04T14:00:07.8464670Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int64 PASSED [0.0074s] [ 13%] 2025-12-04T14:00:07.8465435Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int8 PASSED [0.1671s] [ 13%] 2025-12-04T14:00:07.8466215Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_uint8 PASSED [0.0070s] [ 13%] 2025-12-04T14:00:07.8467013Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex128 PASSED [0.1669s] [ 13%] 2025-12-04T14:00:07.8467824Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex64 PASSED [0.0074s] [ 13%] 2025-12-04T14:00:07.8468645Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float32 PASSED [0.1662s] [ 13%] 2025-12-04T14:00:07.8469434Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float64 PASSED [0.0073s] [ 13%] 2025-12-04T14:00:07.8470209Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int16 PASSED [0.1664s] [ 13%] 2025-12-04T14:00:07.8471018Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int32 PASSED [0.0073s] [ 13%] 2025-12-04T14:00:07.8471777Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int64 PASSED [0.1661s] [ 13%] 2025-12-04T14:00:07.8472551Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int8 PASSED [0.0073s] [ 13%] 2025-12-04T14:00:07.8473329Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_uint8 PASSED [0.1667s] [ 13%] 2025-12-04T14:00:07.8474094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float32 PASSED [0.0073s] [ 13%] 2025-12-04T14:00:07.8474878Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float64 PASSED [0.1664s] [ 13%] 2025-12-04T14:00:07.8475660Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int16 PASSED [0.0071s] [ 13%] 2025-12-04T14:00:07.8476420Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int32 PASSED [0.1663s] [ 13%] 2025-12-04T14:00:07.8477175Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int64 PASSED [0.0071s] [ 13%] 2025-12-04T14:00:07.8477934Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int8 PASSED [0.1663s] [ 13%] 2025-12-04T14:00:07.8478699Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_uint8 PASSED [0.0071s] [ 13%] 2025-12-04T14:00:07.8479469Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float32 PASSED [0.1663s] [ 13%] 2025-12-04T14:00:07.8480239Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float64 PASSED [0.0073s] [ 13%] 2025-12-04T14:00:07.8481159Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex128 SKIPPED [0.0026s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8482230Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex64 SKIPPED [0.0028s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8483278Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8484304Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8485324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int16 SKIPPED [0.0027s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8486384Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int32 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8487438Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8488439Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8489499Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8490541Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex128 SKIPPED [0.0025s] (Skipped! Out not supported) [ 13%] 2025-12-04T14:00:07.8491602Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 14%] 2025-12-04T14:00:07.8492640Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 14%] 2025-12-04T14:00:07.8493678Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 14%] 2025-12-04T14:00:07.8494700Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int16 SKIPPED [0.0025s] (Skipped! Out not supported) [ 14%] 2025-12-04T14:00:07.8495756Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 14%] 2025-12-04T14:00:07.8496760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 14%] 2025-12-04T14:00:07.8497839Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 14%] 2025-12-04T14:00:07.8498891Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_uint8 SKIPPED [0.0025s] (Skipped! Out not supported) [ 14%] 2025-12-04T14:00:07.8499874Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float32 PASSED [0.1814s] [ 14%] 2025-12-04T14:00:07.8500685Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float64 PASSED [0.0072s] [ 14%] 2025-12-04T14:00:07.8501488Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int16 PASSED [0.1669s] [ 14%] 2025-12-04T14:00:07.8502271Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int32 PASSED [0.0071s] [ 14%] 2025-12-04T14:00:07.8503065Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int64 PASSED [0.1666s] [ 14%] 2025-12-04T14:00:07.8503847Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int8 PASSED [0.0071s] [ 14%] 2025-12-04T14:00:07.8504647Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_uint8 PASSED [0.1668s] [ 14%] 2025-12-04T14:00:07.8505452Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float32 PASSED [0.0072s] [ 14%] 2025-12-04T14:00:07.8506246Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float64 PASSED [0.1665s] [ 14%] 2025-12-04T14:00:07.8507033Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int16 PASSED [0.0071s] [ 14%] 2025-12-04T14:00:07.8507973Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int32 PASSED [0.1668s] [ 14%] 2025-12-04T14:00:07.8508785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int64 PASSED [0.0071s] [ 14%] 2025-12-04T14:00:07.8509588Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int8 PASSED [0.1671s] [ 14%] 2025-12-04T14:00:07.8510367Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_uint8 PASSED [0.0071s] [ 14%] 2025-12-04T14:00:07.8511163Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex128 PASSED [0.1671s] [ 14%] 2025-12-04T14:00:07.8511965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex64 PASSED [0.0074s] [ 14%] 2025-12-04T14:00:07.8512751Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float32 PASSED [0.1669s] [ 14%] 2025-12-04T14:00:07.8513599Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float64 PASSED [0.0073s] [ 14%] 2025-12-04T14:00:07.8514426Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int16 PASSED [0.1670s] [ 14%] 2025-12-04T14:00:07.8515187Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int32 PASSED [0.0073s] [ 14%] 2025-12-04T14:00:07.8515944Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int64 PASSED [0.1670s] [ 14%] 2025-12-04T14:00:07.8516701Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int8 PASSED [0.0073s] [ 14%] 2025-12-04T14:00:07.8517461Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_uint8 PASSED [0.1669s] [ 14%] 2025-12-04T14:00:07.8518244Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float32 PASSED [0.0073s] [ 15%] 2025-12-04T14:00:07.8519054Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float64 PASSED [0.1674s] [ 15%] 2025-12-04T14:00:07.8519867Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int16 PASSED [0.0071s] [ 15%] 2025-12-04T14:00:07.8520660Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int32 PASSED [0.1671s] [ 15%] 2025-12-04T14:00:07.8521451Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int64 PASSED [0.0071s] [ 15%] 2025-12-04T14:00:07.8522299Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int8 PASSED [0.1669s] [ 15%] 2025-12-04T14:00:07.8523097Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_uint8 PASSED [0.0072s] [ 15%] 2025-12-04T14:00:07.8523892Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex128 PASSED [0.1676s] [ 15%] 2025-12-04T14:00:07.8524747Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex64 PASSED [0.0075s] [ 15%] 2025-12-04T14:00:07.8525512Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float32 PASSED [0.1673s] [ 15%] 2025-12-04T14:00:07.8526277Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float64 PASSED [0.0073s] [ 15%] 2025-12-04T14:00:07.8527029Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int16 PASSED [0.1677s] [ 15%] 2025-12-04T14:00:07.8527773Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int32 PASSED [0.0072s] [ 15%] 2025-12-04T14:00:07.8528512Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int64 PASSED [0.1672s] [ 15%] 2025-12-04T14:00:07.8529257Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int8 PASSED [0.0072s] [ 15%] 2025-12-04T14:00:07.8529989Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_uint8 PASSED [0.1679s] [ 15%] 2025-12-04T14:00:07.8530937Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8532074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float64 SKIPPED [0.0027s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8533203Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int16 SKIPPED [0.0025s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8534311Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int32 SKIPPED [0.0027s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8535427Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8536540Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8537656Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8538784Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex128 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8540010Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8541129Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8542194Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float64 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8543250Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int16 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8544283Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int32 SKIPPED [0.0028s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8545324Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int64 SKIPPED [0.0025s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8546362Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int8 SKIPPED [0.0026s] (Skipped! Out not supported) [ 15%] 2025-12-04T14:00:07.8547402Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_uint8 SKIPPED [0.0028s] (Skipped! Out not supported) [ 16%] 2025-12-04T14:00:07.8548316Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float32 PASSED [0.1678s] [ 16%] 2025-12-04T14:00:07.8549113Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float64 PASSED [0.0074s] [ 16%] 2025-12-04T14:00:07.8549941Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int16 PASSED [0.1677s] [ 16%] 2025-12-04T14:00:07.8550719Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int32 PASSED [0.0073s] [ 16%] 2025-12-04T14:00:07.8551527Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int64 PASSED [0.1674s] [ 16%] 2025-12-04T14:00:07.8552294Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int8 PASSED [0.0073s] [ 16%] 2025-12-04T14:00:07.8553061Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_uint8 PASSED [0.1675s] [ 16%] 2025-12-04T14:00:07.8553833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float32 PASSED [0.0073s] [ 16%] 2025-12-04T14:00:07.8554607Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float64 PASSED [0.1675s] [ 16%] 2025-12-04T14:00:07.8555367Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int16 PASSED [0.0071s] [ 16%] 2025-12-04T14:00:07.8556125Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int32 PASSED [0.1673s] [ 16%] 2025-12-04T14:00:07.8556872Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int64 PASSED [0.0071s] [ 16%] 2025-12-04T14:00:07.8557631Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int8 PASSED [0.1670s] [ 16%] 2025-12-04T14:00:07.8558387Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_uint8 PASSED [0.0071s] [ 16%] 2025-12-04T14:00:07.8559159Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex128 PASSED [0.1676s] [ 16%] 2025-12-04T14:00:07.8559934Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex64 PASSED [0.0074s] [ 16%] 2025-12-04T14:00:07.8560696Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float32 PASSED [0.1676s] [ 16%] 2025-12-04T14:00:07.8561452Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float64 PASSED [0.0073s] [ 16%] 2025-12-04T14:00:07.8562203Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int16 PASSED [0.1673s] [ 16%] 2025-12-04T14:00:07.8562950Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int32 PASSED [0.0071s] [ 16%] 2025-12-04T14:00:07.8563690Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int64 PASSED [0.1675s] [ 16%] 2025-12-04T14:00:07.8564435Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int8 PASSED [0.0071s] [ 16%] 2025-12-04T14:00:07.8565172Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_uint8 PASSED [0.1671s] [ 16%] 2025-12-04T14:00:07.8565933Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float32 PASSED [0.0074s] [ 16%] 2025-12-04T14:00:07.8566762Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float64 PASSED [0.1675s] [ 16%] 2025-12-04T14:00:07.8567563Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int16 PASSED [0.0071s] [ 16%] 2025-12-04T14:00:07.8568311Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int32 PASSED [0.1676s] [ 16%] 2025-12-04T14:00:07.8569058Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int64 PASSED [0.0071s] [ 16%] 2025-12-04T14:00:07.8569804Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int8 PASSED [0.1675s] [ 16%] 2025-12-04T14:00:07.8570552Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_uint8 PASSED [0.0071s] [ 16%] 2025-12-04T14:00:07.8571316Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float32 PASSED [0.1675s] [ 17%] 2025-12-04T14:00:07.8572108Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float64 PASSED [0.0072s] [ 17%] 2025-12-04T14:00:07.8572893Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int16 PASSED [0.1674s] [ 17%] 2025-12-04T14:00:07.8573666Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int32 PASSED [0.0071s] [ 17%] 2025-12-04T14:00:07.8574441Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int64 PASSED [0.1672s] [ 17%] 2025-12-04T14:00:07.8575258Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int8 PASSED [0.0071s] [ 17%] 2025-12-04T14:00:07.8576032Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_uint8 PASSED [0.1676s] [ 17%] 2025-12-04T14:00:07.8576805Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex128 PASSED [0.0074s] [ 17%] 2025-12-04T14:00:07.8577626Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex64 PASSED [0.1679s] [ 17%] 2025-12-04T14:00:07.8578393Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float32 PASSED [0.0073s] [ 17%] 2025-12-04T14:00:07.8579271Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float64 PASSED [0.1681s] [ 17%] 2025-12-04T14:00:07.8580021Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int16 PASSED [0.0074s] [ 17%] 2025-12-04T14:00:07.8580771Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int32 PASSED [0.1674s] [ 17%] 2025-12-04T14:00:07.8581520Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int64 PASSED [0.0073s] [ 17%] 2025-12-04T14:00:07.8582254Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int8 PASSED [0.1678s] [ 17%] 2025-12-04T14:00:07.8582998Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_uint8 PASSED [0.0073s] [ 17%] 2025-12-04T14:00:07.8583774Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex128 PASSED [0.1677s] [ 17%] 2025-12-04T14:00:07.8584560Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex64 PASSED [0.0074s] [ 17%] 2025-12-04T14:00:07.8585327Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float32 PASSED [0.1678s] [ 17%] 2025-12-04T14:00:07.8586095Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float64 PASSED [0.0073s] [ 17%] 2025-12-04T14:00:07.8586852Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int16 PASSED [0.1678s] [ 17%] 2025-12-04T14:00:07.8587601Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int32 PASSED [0.0073s] [ 17%] 2025-12-04T14:00:07.8588343Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int64 PASSED [0.1678s] [ 17%] 2025-12-04T14:00:07.8589094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int8 PASSED [0.0073s] [ 17%] 2025-12-04T14:00:07.8589857Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_uint8 PASSED [0.1678s] [ 17%] 2025-12-04T14:00:07.8590636Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex128 PASSED [0.0074s] [ 17%] 2025-12-04T14:00:07.8591424Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex64 PASSED [0.1676s] [ 17%] 2025-12-04T14:00:07.8592253Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float32 PASSED [0.0074s] [ 17%] 2025-12-04T14:00:07.8593023Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float64 PASSED [0.1679s] [ 17%] 2025-12-04T14:00:07.8593820Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int16 PASSED [0.0074s] [ 17%] 2025-12-04T14:00:07.8594584Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int32 PASSED [0.1678s] [ 17%] 2025-12-04T14:00:07.8595329Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int64 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8596082Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int8 PASSED [0.1681s] [ 18%] 2025-12-04T14:00:07.8596833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_uint8 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8597597Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex128 PASSED [0.1682s] [ 18%] 2025-12-04T14:00:07.8598386Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex64 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8599204Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float32 PASSED [0.1682s] [ 18%] 2025-12-04T14:00:07.8599965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float64 PASSED [0.0074s] [ 18%] 2025-12-04T14:00:07.8600721Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int16 PASSED [0.1675s] [ 18%] 2025-12-04T14:00:07.8601533Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int32 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8602271Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int64 PASSED [0.1679s] [ 18%] 2025-12-04T14:00:07.8603049Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int8 PASSED [0.0074s] [ 18%] 2025-12-04T14:00:07.8603787Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_uint8 PASSED [0.1681s] [ 18%] 2025-12-04T14:00:07.8604566Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex128 PASSED [0.0074s] [ 18%] 2025-12-04T14:00:07.8605354Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex64 PASSED [0.1681s] [ 18%] 2025-12-04T14:00:07.8606139Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float32 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8606905Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float64 PASSED [0.1681s] [ 18%] 2025-12-04T14:00:07.8607664Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int16 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8608669Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int32 PASSED [0.1687s] [ 18%] 2025-12-04T14:00:07.8609397Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int64 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8610127Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int8 PASSED [0.1677s] [ 18%] 2025-12-04T14:00:07.8610856Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_uint8 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8611592Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float32 PASSED [0.1681s] [ 18%] 2025-12-04T14:00:07.8612345Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float64 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8613093Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int16 PASSED [0.1680s] [ 18%] 2025-12-04T14:00:07.8613838Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int32 PASSED [0.0071s] [ 18%] 2025-12-04T14:00:07.8614580Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int64 PASSED [0.1679s] [ 18%] 2025-12-04T14:00:07.8615319Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int8 PASSED [0.0071s] [ 18%] 2025-12-04T14:00:07.8616056Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_uint8 PASSED [0.1680s] [ 18%] 2025-12-04T14:00:07.8616874Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex128 PASSED [0.0084s] [ 18%] 2025-12-04T14:00:07.8617783Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex64 PASSED [0.1684s] [ 18%] 2025-12-04T14:00:07.8618785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float32 PASSED [0.0073s] [ 18%] 2025-12-04T14:00:07.8619807Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float64 PASSED [0.1683s] [ 19%] 2025-12-04T14:00:07.8620680Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int16 PASSED [0.0071s] [ 19%] 2025-12-04T14:00:07.8621545Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int32 PASSED [0.1680s] [ 19%] 2025-12-04T14:00:07.8622402Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int64 PASSED [0.0071s] [ 19%] 2025-12-04T14:00:07.8623252Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int8 PASSED [0.1681s] [ 19%] 2025-12-04T14:00:07.8624095Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_uint8 PASSED [0.0071s] [ 19%] 2025-12-04T14:00:07.8624985Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex128 PASSED [0.1689s] [ 19%] 2025-12-04T14:00:07.8625895Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex64 PASSED [0.0074s] [ 19%] 2025-12-04T14:00:07.8626792Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float32 PASSED [0.1683s] [ 19%] 2025-12-04T14:00:07.8627726Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float64 PASSED [0.0073s] [ 19%] 2025-12-04T14:00:07.8628639Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int16 PASSED [0.1684s] [ 19%] 2025-12-04T14:00:07.8630356Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int32 PASSED [0.0073s] [ 19%] 2025-12-04T14:00:07.8631216Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int64 PASSED [0.1683s] [ 19%] 2025-12-04T14:00:07.8632088Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int8 PASSED [0.0073s] [ 19%] 2025-12-04T14:00:07.8632969Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_uint8 PASSED [0.1682s] [ 19%] 2025-12-04T14:00:07.8633875Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex128 PASSED [0.0073s] [ 19%] 2025-12-04T14:00:07.8634802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex64 PASSED [0.1682s] [ 19%] 2025-12-04T14:00:07.8635712Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float32 PASSED [0.0072s] [ 19%] 2025-12-04T14:00:07.8636613Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float64 PASSED [0.1683s] [ 19%] 2025-12-04T14:00:07.8637510Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int16 PASSED [0.0073s] [ 19%] 2025-12-04T14:00:07.8638380Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int32 PASSED [0.1684s] [ 19%] 2025-12-04T14:00:07.8639318Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int64 PASSED [0.0072s] [ 19%] 2025-12-04T14:00:07.8640196Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int8 PASSED [0.1681s] [ 19%] 2025-12-04T14:00:07.8641080Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_uint8 PASSED [0.0072s] [ 19%] 2025-12-04T14:00:07.8641966Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex128 PASSED [0.1684s] [ 19%] 2025-12-04T14:00:07.8642881Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex64 PASSED [0.0073s] [ 19%] 2025-12-04T14:00:07.8643791Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float32 PASSED [0.1682s] [ 19%] 2025-12-04T14:00:07.8644679Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float64 PASSED [0.0072s] [ 19%] 2025-12-04T14:00:07.8645608Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int16 PASSED [0.1686s] [ 19%] 2025-12-04T14:00:07.8646482Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int32 PASSED [0.0072s] [ 19%] 2025-12-04T14:00:07.8647398Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int64 PASSED [0.1684s] [ 19%] 2025-12-04T14:00:07.8648257Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int8 PASSED [0.0072s] [ 20%] 2025-12-04T14:00:07.8649103Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_uint8 PASSED [0.1685s] [ 20%] 2025-12-04T14:00:07.8650008Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex128 PASSED [0.0073s] [ 20%] 2025-12-04T14:00:07.8650940Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex64 PASSED [0.1684s] [ 20%] 2025-12-04T14:00:07.8651849Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float32 PASSED [0.0072s] [ 20%] 2025-12-04T14:00:07.8652744Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float64 PASSED [0.1684s] [ 20%] 2025-12-04T14:00:07.8653629Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int16 PASSED [0.0072s] [ 20%] 2025-12-04T14:00:07.8654563Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int32 PASSED [0.1687s] [ 20%] 2025-12-04T14:00:07.8655436Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int64 PASSED [0.0072s] [ 20%] 2025-12-04T14:00:07.8656295Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int8 PASSED [0.1687s] [ 20%] 2025-12-04T14:00:07.8666401Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_uint8 PASSED [0.0070s] [ 20%] 2025-12-04T14:00:07.8667295Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float32 PASSED [0.1685s] [ 20%] 2025-12-04T14:00:07.8668179Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float64 PASSED [0.0072s] [ 20%] 2025-12-04T14:00:07.8669104Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int16 PASSED [0.1686s] [ 20%] 2025-12-04T14:00:07.8669961Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int32 PASSED [0.0070s] [ 20%] 2025-12-04T14:00:07.8670817Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int64 PASSED [0.1683s] [ 20%] 2025-12-04T14:00:07.8671666Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int8 PASSED [0.0070s] [ 20%] 2025-12-04T14:00:07.8672513Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_uint8 PASSED [0.1685s] [ 20%] 2025-12-04T14:00:07.8673393Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex128 PASSED [0.0074s] [ 20%] 2025-12-04T14:00:07.8674293Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex64 PASSED [0.1688s] [ 20%] 2025-12-04T14:00:07.8675178Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float32 PASSED [0.0072s] [ 20%] 2025-12-04T14:00:07.8676050Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float64 PASSED [0.1684s] [ 20%] 2025-12-04T14:00:07.8676916Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int16 PASSED [0.0069s] [ 20%] 2025-12-04T14:00:07.8677776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int32 PASSED [0.1685s] [ 20%] 2025-12-04T14:00:07.8678622Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int64 PASSED [0.0069s] [ 20%] 2025-12-04T14:00:07.8679475Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int8 PASSED [0.1678s] [ 20%] 2025-12-04T14:00:07.8680320Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_uint8 PASSED [0.0070s] [ 20%] 2025-12-04T14:00:07.8681326Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex128 PASSED [0.1689s] [ 20%] 2025-12-04T14:00:07.8682351Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex64 PASSED [0.0073s] [ 20%] 2025-12-04T14:00:07.8683322Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float32 PASSED [0.1687s] [ 20%] 2025-12-04T14:00:07.8684279Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float64 PASSED [0.0072s] [ 20%] 2025-12-04T14:00:07.8685226Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int16 PASSED [0.1686s] [ 21%] 2025-12-04T14:00:07.8686161Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int32 PASSED [0.0070s] [ 21%] 2025-12-04T14:00:07.8687094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int64 PASSED [0.1686s] [ 21%] 2025-12-04T14:00:07.8688029Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int8 PASSED [0.0070s] [ 21%] 2025-12-04T14:00:07.8689016Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_uint8 PASSED [0.1686s] [ 21%] 2025-12-04T14:00:07.8689931Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float32 PASSED [0.0072s] [ 21%] 2025-12-04T14:00:07.8690879Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float64 PASSED [0.1685s] [ 21%] 2025-12-04T14:00:07.8691778Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int16 PASSED [0.0072s] [ 21%] 2025-12-04T14:00:07.8692703Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int32 PASSED [0.1687s] [ 21%] 2025-12-04T14:00:07.8693579Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int64 PASSED [0.0073s] [ 21%] 2025-12-04T14:00:07.8694456Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int8 PASSED [0.1690s] [ 21%] 2025-12-04T14:00:07.8695331Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_uint8 PASSED [0.0073s] [ 21%] 2025-12-04T14:00:07.8696213Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float32 PASSED [0.1688s] [ 21%] 2025-12-04T14:00:07.8697076Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float64 PASSED [0.0072s] [ 21%] 2025-12-04T14:00:07.8697928Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int16 PASSED [0.1682s] [ 21%] 2025-12-04T14:00:07.8698780Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int32 PASSED [0.0073s] [ 21%] 2025-12-04T14:00:07.8699705Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int64 PASSED [0.1690s] [ 21%] 2025-12-04T14:00:07.8700539Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int8 PASSED [0.0072s] [ 21%] 2025-12-04T14:00:07.8701380Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_uint8 PASSED [0.1688s] [ 21%] 2025-12-04T14:00:07.8702243Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float32 PASSED [0.0072s] [ 21%] 2025-12-04T14:00:07.8703128Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float64 PASSED [0.1690s] [ 21%] 2025-12-04T14:00:07.8704009Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int16 PASSED [0.0072s] [ 21%] 2025-12-04T14:00:07.8704878Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int32 PASSED [0.1691s] [ 21%] 2025-12-04T14:00:07.8705754Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int64 PASSED [0.0072s] [ 21%] 2025-12-04T14:00:07.8706623Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int8 PASSED [0.1688s] [ 21%] 2025-12-04T14:00:07.8707569Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_uint8 PASSED [0.0070s] [ 21%] 2025-12-04T14:00:07.8708700Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex128 PASSED [0.1692s] [ 21%] 2025-12-04T14:00:07.8709696Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex64 PASSED [0.0073s] [ 21%] 2025-12-04T14:00:07.8710586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float32 PASSED [0.1686s] [ 21%] 2025-12-04T14:00:07.8711470Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float64 PASSED [0.0073s] [ 21%] 2025-12-04T14:00:07.8712384Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int16 PASSED [0.1689s] [ 21%] 2025-12-04T14:00:07.8713471Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int32 PASSED [0.0072s] [ 22%] 2025-12-04T14:00:07.8714334Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int64 PASSED [0.1681s] [ 22%] 2025-12-04T14:00:07.8715256Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int8 PASSED [0.0072s] [ 22%] 2025-12-04T14:00:07.8716173Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_uint8 PASSED [0.1682s] [ 22%] 2025-12-04T14:00:07.8717196Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float32 PASSED [0.0072s] [ 22%] 2025-12-04T14:00:07.8718079Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float64 PASSED [0.1682s] [ 22%] 2025-12-04T14:00:07.8718948Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int16 PASSED [0.0070s] [ 22%] 2025-12-04T14:00:07.8719869Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int32 PASSED [0.1681s] [ 22%] 2025-12-04T14:00:07.8720731Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int64 PASSED [0.0071s] [ 22%] 2025-12-04T14:00:07.8721585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int8 PASSED [0.1674s] [ 22%] 2025-12-04T14:00:07.8722442Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_uint8 PASSED [0.0071s] [ 22%] 2025-12-04T14:00:07.8723308Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float32 PASSED [0.1682s] [ 22%] 2025-12-04T14:00:07.8724177Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float64 PASSED [0.0072s] [ 22%] 2025-12-04T14:00:07.8725068Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex128 PASSED [0.1684s] [ 22%] 2025-12-04T14:00:07.8725977Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex64 PASSED [0.0072s] [ 22%] 2025-12-04T14:00:07.8726871Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float32 PASSED [0.1679s] [ 22%] 2025-12-04T14:00:07.8727760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float64 PASSED [0.0071s] [ 22%] 2025-12-04T14:00:07.8728634Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int16 PASSED [0.1690s] [ 22%] 2025-12-04T14:00:07.8729498Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int32 PASSED [0.0071s] [ 22%] 2025-12-04T14:00:07.8730361Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int64 PASSED [0.1686s] [ 22%] 2025-12-04T14:00:07.8731214Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int8 PASSED [0.0071s] [ 22%] 2025-12-04T14:00:07.8732074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_uint8 PASSED [0.1686s] [ 22%] 2025-12-04T14:00:07.8732961Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex128 PASSED [0.0071s] [ 22%] 2025-12-04T14:00:07.8733866Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex64 PASSED [0.1693s] [ 22%] 2025-12-04T14:00:07.8734820Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float32 PASSED [0.0071s] [ 22%] 2025-12-04T14:00:07.8735750Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float64 PASSED [0.1691s] [ 22%] 2025-12-04T14:00:07.8736622Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int16 PASSED [0.0071s] [ 22%] 2025-12-04T14:00:07.8737487Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int32 PASSED [0.1682s] [ 22%] 2025-12-04T14:00:07.8738342Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int64 PASSED [0.0070s] [ 22%] 2025-12-04T14:00:07.8739262Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int8 PASSED [0.1691s] [ 22%] 2025-12-04T14:00:07.8740126Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_uint8 PASSED [0.0070s] [ 22%] 2025-12-04T14:00:07.8741017Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float32 PASSED [0.1688s] [ 23%] 2025-12-04T14:00:07.8741923Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float64 PASSED [0.0072s] [ 23%] 2025-12-04T14:00:07.8742823Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int16 PASSED [0.1691s] [ 23%] 2025-12-04T14:00:07.8743759Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int32 PASSED [0.0071s] [ 23%] 2025-12-04T14:00:07.8744650Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int64 PASSED [0.1693s] [ 23%] 2025-12-04T14:00:07.8745572Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int8 PASSED [0.0071s] [ 23%] 2025-12-04T14:00:07.8746455Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_uint8 PASSED [0.1694s] [ 23%] 2025-12-04T14:00:07.8747353Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float32 PASSED [0.0071s] [ 23%] 2025-12-04T14:00:07.8748263Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float64 PASSED [0.1693s] [ 23%] 2025-12-04T14:00:07.8749165Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int16 PASSED [0.0071s] [ 23%] 2025-12-04T14:00:07.8750057Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int32 PASSED [0.1695s] [ 23%] 2025-12-04T14:00:07.8750948Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int64 PASSED [0.0071s] [ 23%] 2025-12-04T14:00:07.8751833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int8 PASSED [0.1687s] [ 23%] 2025-12-04T14:00:07.8752721Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_uint8 PASSED [0.0072s] [ 23%] 2025-12-04T14:00:07.8753617Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex128 PASSED [0.1690s] [ 23%] 2025-12-04T14:00:07.8754526Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex64 PASSED [0.0073s] [ 23%] 2025-12-04T14:00:07.8755417Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float32 PASSED [0.1696s] [ 23%] 2025-12-04T14:00:07.8756292Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float64 PASSED [0.0072s] [ 23%] 2025-12-04T14:00:07.8757161Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int16 PASSED [0.1692s] [ 23%] 2025-12-04T14:00:07.8758027Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int32 PASSED [0.0072s] [ 23%] 2025-12-04T14:00:07.8758942Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int64 PASSED [0.1696s] [ 23%] 2025-12-04T14:00:07.8759793Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int8 PASSED [0.0072s] [ 23%] 2025-12-04T14:00:07.8760704Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_uint8 PASSED [0.1693s] [ 23%] 2025-12-04T14:00:07.8761600Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float32 PASSED [0.0072s] [ 23%] 2025-12-04T14:00:07.8762566Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float64 PASSED [0.1695s] [ 23%] 2025-12-04T14:00:07.8763481Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int16 PASSED [0.0071s] [ 23%] 2025-12-04T14:00:07.8764388Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int32 PASSED [0.1695s] [ 23%] 2025-12-04T14:00:07.8765296Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int64 PASSED [0.0071s] [ 23%] 2025-12-04T14:00:07.8766190Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int8 PASSED [0.1697s] [ 23%] 2025-12-04T14:00:07.8767092Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_uint8 PASSED [0.0070s] [ 23%] 2025-12-04T14:00:07.8767991Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex128 PASSED [0.1695s] [ 23%] 2025-12-04T14:00:07.8768885Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex64 PASSED [0.0073s] [ 24%] 2025-12-04T14:00:07.8769798Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float32 PASSED [0.1698s] [ 24%] 2025-12-04T14:00:07.8770666Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float64 PASSED [0.0073s] [ 24%] 2025-12-04T14:00:07.8771516Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int16 PASSED [0.1694s] [ 24%] 2025-12-04T14:00:07.8772404Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int32 PASSED [0.0071s] [ 24%] 2025-12-04T14:00:07.8773240Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int64 PASSED [0.1697s] [ 24%] 2025-12-04T14:00:07.8774083Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int8 PASSED [0.0071s] [ 24%] 2025-12-04T14:00:07.8774925Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_uint8 PASSED [0.1695s] [ 24%] 2025-12-04T14:00:07.8775854Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float32 PASSED [0.0072s] [ 24%] 2025-12-04T14:00:07.8776851Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float64 PASSED [0.1697s] [ 24%] 2025-12-04T14:00:07.8777841Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int16 PASSED [0.0070s] [ 24%] 2025-12-04T14:00:07.8778870Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int32 PASSED [0.1695s] [ 24%] 2025-12-04T14:00:07.8779902Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int64 PASSED [0.0069s] [ 24%] 2025-12-04T14:00:07.8780874Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int8 PASSED [0.1696s] [ 24%] 2025-12-04T14:00:07.8781849Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_uint8 PASSED [0.0069s] [ 24%] 2025-12-04T14:00:07.8782808Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex128 PASSED [0.1689s] [ 24%] 2025-12-04T14:00:07.8783750Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex64 PASSED [0.0073s] [ 24%] 2025-12-04T14:00:07.8784669Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float32 PASSED [0.1698s] [ 24%] 2025-12-04T14:00:07.8785581Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float64 PASSED [0.0072s] [ 24%] 2025-12-04T14:00:07.8786487Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int16 PASSED [0.1696s] [ 24%] 2025-12-04T14:00:07.8787426Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int32 PASSED [0.0070s] [ 24%] 2025-12-04T14:00:07.8788431Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int64 PASSED [0.1694s] [ 24%] 2025-12-04T14:00:07.8789368Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int8 PASSED [0.0069s] [ 24%] 2025-12-04T14:00:07.8790261Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_uint8 PASSED [0.1695s] [ 24%] 2025-12-04T14:00:07.8791157Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float32 PASSED [0.0072s] [ 24%] 2025-12-04T14:00:07.8792057Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float64 PASSED [0.1697s] [ 24%] 2025-12-04T14:00:07.8792946Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int16 PASSED [0.0072s] [ 24%] 2025-12-04T14:00:07.8793828Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int32 PASSED [0.1694s] [ 24%] 2025-12-04T14:00:07.8794819Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int64 PASSED [0.0073s] [ 24%] 2025-12-04T14:00:07.8795699Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int8 PASSED [0.1699s] [ 24%] 2025-12-04T14:00:07.8796639Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_uint8 PASSED [0.0072s] [ 24%] 2025-12-04T14:00:07.8797522Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float32 PASSED [0.1700s] [ 25%] 2025-12-04T14:00:07.8798472Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float64 PASSED [0.0073s] [ 25%] 2025-12-04T14:00:07.8799393Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int16 PASSED [0.1689s] [ 25%] 2025-12-04T14:00:07.8800259Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int32 PASSED [0.0070s] [ 25%] 2025-12-04T14:00:07.8801123Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int64 PASSED [0.1699s] [ 25%] 2025-12-04T14:00:07.8801986Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int8 PASSED [0.0070s] [ 25%] 2025-12-04T14:00:07.8802840Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_uint8 PASSED [0.1697s] [ 25%] 2025-12-04T14:00:07.8803723Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex128 PASSED [0.0073s] [ 25%] 2025-12-04T14:00:07.8804611Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex64 PASSED [0.1702s] [ 25%] 2025-12-04T14:00:07.8805484Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float32 PASSED [0.0072s] [ 25%] 2025-12-04T14:00:07.8806349Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float64 PASSED [0.1701s] [ 25%] 2025-12-04T14:00:07.8807207Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int16 PASSED [0.0071s] [ 25%] 2025-12-04T14:00:07.8808312Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int32 PASSED [0.1697s] [ 25%] 2025-12-04T14:00:07.8809212Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int64 PASSED [0.0070s] [ 25%] 2025-12-04T14:00:07.8810055Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int8 PASSED [0.1699s] [ 25%] 2025-12-04T14:00:07.8810896Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_uint8 PASSED [0.0070s] [ 25%] 2025-12-04T14:00:07.8811757Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float32 PASSED [0.1698s] [ 25%] 2025-12-04T14:00:07.8812625Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float64 PASSED [0.0073s] [ 25%] 2025-12-04T14:00:07.8813490Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int16 PASSED [0.1697s] [ 25%] 2025-12-04T14:00:07.8814437Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int32 PASSED [0.0070s] [ 25%] 2025-12-04T14:00:07.8815343Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int64 PASSED [0.1700s] [ 25%] 2025-12-04T14:00:07.8816188Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int8 PASSED [0.0070s] [ 25%] 2025-12-04T14:00:07.8817039Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_uint8 PASSED [0.1701s] [ 25%] 2025-12-04T14:00:07.8817914Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float32 PASSED [0.0071s] [ 25%] 2025-12-04T14:00:07.8818821Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float64 PASSED [0.1698s] [ 25%] 2025-12-04T14:00:07.8819799Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int16 PASSED [0.0071s] [ 25%] 2025-12-04T14:00:07.8820681Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int32 PASSED [0.1698s] [ 25%] 2025-12-04T14:00:07.8821562Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int64 PASSED [0.0071s] [ 25%] 2025-12-04T14:00:07.8822439Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int8 PASSED [0.1700s] [ 25%] 2025-12-04T14:00:07.8823374Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_uint8 PASSED [0.0071s] [ 25%] 2025-12-04T14:00:07.8824259Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex128 PASSED [0.1701s] [ 25%] 2025-12-04T14:00:07.8825196Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex64 PASSED [0.0073s] [ 26%] 2025-12-04T14:00:07.8826072Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float32 PASSED [0.1705s] [ 26%] 2025-12-04T14:00:07.8826929Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float64 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8827783Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int16 PASSED [0.1704s] [ 26%] 2025-12-04T14:00:07.8828630Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int32 PASSED [0.0074s] [ 26%] 2025-12-04T14:00:07.8829473Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int64 PASSED [0.1704s] [ 26%] 2025-12-04T14:00:07.8830308Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int8 PASSED [0.0074s] [ 26%] 2025-12-04T14:00:07.8831143Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_uint8 PASSED [0.1701s] [ 26%] 2025-12-04T14:00:07.8832014Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex128 PASSED [0.0073s] [ 26%] 2025-12-04T14:00:07.8832917Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex64 PASSED [0.1701s] [ 26%] 2025-12-04T14:00:07.8833809Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float32 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8834685Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float64 PASSED [0.1703s] [ 26%] 2025-12-04T14:00:07.8835546Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int16 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8836393Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int32 PASSED [0.1703s] [ 26%] 2025-12-04T14:00:07.8837242Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int64 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8838092Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int8 PASSED [0.1699s] [ 26%] 2025-12-04T14:00:07.8838939Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_uint8 PASSED [0.0073s] [ 26%] 2025-12-04T14:00:07.8839813Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex128 PASSED [0.1703s] [ 26%] 2025-12-04T14:00:07.8840765Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex64 PASSED [0.0073s] [ 26%] 2025-12-04T14:00:07.8841692Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float32 PASSED [0.1700s] [ 26%] 2025-12-04T14:00:07.8842571Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float64 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8843431Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int16 PASSED [0.1705s] [ 26%] 2025-12-04T14:00:07.8844288Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int32 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8845146Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int64 PASSED [0.1705s] [ 26%] 2025-12-04T14:00:07.8846000Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int8 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8846852Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_uint8 PASSED [0.1708s] [ 26%] 2025-12-04T14:00:07.8847729Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex128 PASSED [0.0073s] [ 26%] 2025-12-04T14:00:07.8848623Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex64 PASSED [0.1705s] [ 26%] 2025-12-04T14:00:07.8849540Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float32 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8850406Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float64 PASSED [0.1708s] [ 26%] 2025-12-04T14:00:07.8851302Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int16 PASSED [0.0072s] [ 26%] 2025-12-04T14:00:07.8852147Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int32 PASSED [0.1705s] [ 27%] 2025-12-04T14:00:07.8852987Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int64 PASSED [0.0072s] [ 27%] 2025-12-04T14:00:07.8853823Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int8 PASSED [0.1703s] [ 27%] 2025-12-04T14:00:07.8854665Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_uint8 PASSED [0.0072s] [ 27%] 2025-12-04T14:00:07.8855538Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex128 PASSED [0.1709s] [ 27%] 2025-12-04T14:00:07.8856434Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex64 PASSED [0.0073s] [ 27%] 2025-12-04T14:00:07.8857312Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float32 PASSED [0.1705s] [ 27%] 2025-12-04T14:00:07.8858190Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float64 PASSED [0.0073s] [ 27%] 2025-12-04T14:00:07.8859158Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int16 PASSED [0.1704s] [ 27%] 2025-12-04T14:00:07.8860009Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int32 PASSED [0.0073s] [ 27%] 2025-12-04T14:00:07.8860865Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int64 PASSED [0.1706s] [ 27%] 2025-12-04T14:00:07.8861717Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int8 PASSED [0.0073s] [ 27%] 2025-12-04T14:00:07.8862569Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_uint8 PASSED [0.1708s] [ 27%] 2025-12-04T14:00:07.8863432Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float32 PASSED [0.0073s] [ 27%] 2025-12-04T14:00:07.8864320Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float64 PASSED [0.1707s] [ 27%] 2025-12-04T14:00:07.8865189Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int16 PASSED [0.0071s] [ 27%] 2025-12-04T14:00:07.8866059Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int32 PASSED [0.1708s] [ 27%] 2025-12-04T14:00:07.8866972Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int64 PASSED [0.0071s] [ 27%] 2025-12-04T14:00:07.8867870Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int8 PASSED [0.1702s] [ 27%] 2025-12-04T14:00:07.8868730Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_uint8 PASSED [0.0071s] [ 27%] 2025-12-04T14:00:07.8869581Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_complex128 PASSED [0.7683s] [ 27%] 2025-12-04T14:00:07.8870402Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_float64 PASSED [0.5865s] [ 27%] 2025-12-04T14:00:07.8871414Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%] 2025-12-04T14:00:07.8872591Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%] 2025-12-04T14:00:07.8873776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%] 2025-12-04T14:00:07.8874961Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%] 2025-12-04T14:00:07.8876187Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%] 2025-12-04T14:00:07.8877364Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%] 2025-12-04T14:00:07.8878586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%] 2025-12-04T14:00:07.8879774Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 27%] 2025-12-04T14:00:07.8880773Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_ceil_cuda_float64 PASSED [0.4675s] [ 27%] 2025-12-04T14:00:07.8881614Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_complex128 PASSED [0.2537s] [ 28%] 2025-12-04T14:00:07.8882460Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_float64 PASSED [0.0341s] [ 28%] 2025-12-04T14:00:07.8883338Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_complex128 PASSED [1.4354s] [ 28%] 2025-12-04T14:00:07.8884249Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_float64 PASSED [0.4534s] [ 28%] 2025-12-04T14:00:07.8885120Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_deg2rad_cuda_float64 PASSED [0.5208s] [ 28%] 2025-12-04T14:00:07.8886120Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erf_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%] 2025-12-04T14:00:07.8887116Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erfinv_cuda_float64 PASSED [0.6833s] [ 28%] 2025-12-04T14:00:07.8888133Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%] 2025-12-04T14:00:07.8889321Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%] 2025-12-04T14:00:07.8890325Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_floor_cuda_float64 PASSED [0.4654s] [ 28%] 2025-12-04T14:00:07.8891170Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_frac_cuda_float64 PASSED [0.4871s] [ 28%] 2025-12-04T14:00:07.8892174Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_complex128 SKIPPED [0.0027s] (Skipped! Op doesn't support autograd) [ 28%] 2025-12-04T14:00:07.8893392Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_float64 SKIPPED [0.0025s] (Skipped! Op doesn't support autograd) [ 28%] 2025-12-04T14:00:07.8894599Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_complex128 SKIPPED [0.0028s] (Skipped! Op doesn't support autograd) [ 28%] 2025-12-04T14:00:07.8895769Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_float64 SKIPPED [0.0025s] (Skipped! Op doesn't support autograd) [ 28%] 2025-12-04T14:00:07.8896933Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isneginf_cuda_float64 SKIPPED [0.0025s] (Skipped! Op doesn't support autograd) [ 28%] 2025-12-04T14:00:07.8898119Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isposinf_cuda_float64 SKIPPED [0.0027s] (Skipped! Op doesn't support autograd) [ 28%] 2025-12-04T14:00:07.8899187Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_complex128 PASSED [1.5048s] [ 28%] 2025-12-04T14:00:07.8900037Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_float64 PASSED [0.5618s] [ 28%] 2025-12-04T14:00:07.8901047Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nan_to_num_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%] 2025-12-04T14:00:07.8902134Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_complex128 PASSED [1.4003s] [ 28%] 2025-12-04T14:00:07.8902965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_float64 PASSED [0.5186s] [ 28%] 2025-12-04T14:00:07.8903853Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nn_functional_relu_cuda_float64 PASSED [0.3764s] [ 28%] 2025-12-04T14:00:07.8904811Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_complex128 PASSED [1.1605s] [ 28%] 2025-12-04T14:00:07.8905684Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_float64 PASSED [0.4378s] [ 28%] 2025-12-04T14:00:07.8906534Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_rad2deg_cuda_float64 PASSED [0.4994s] [ 28%] 2025-12-04T14:00:07.8907378Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_round_cuda_float64 PASSED [0.4523s] [ 28%] 2025-12-04T14:00:07.8908532Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%] 2025-12-04T14:00:07.8909699Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 28%] 2025-12-04T14:00:07.8910688Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sign_cuda_float64 PASSED [0.4572s] [ 28%] 2025-12-04T14:00:07.8911690Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_signbit_cuda_float64 SKIPPED [0.0027s] (Skipped! Op doesn't support autograd) [ 28%] 2025-12-04T14:00:07.8912867Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8914038Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8915205Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8916385Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8917561Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_complex128 SKIPPED [0.0004s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8918732Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8919977Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8921194Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8922374Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_complex128 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8923548Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_float64 SKIPPED [0.0002s] (Skipped! sparse backward not supported) [ 29%] 2025-12-04T14:00:07.8924555Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_trunc_cuda_float64 PASSED [0.4545s] [ 29%] 2025-12-04T14:00:07.8925407Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex128 PASSED [0.0035s] [ 29%] 2025-12-04T14:00:07.8926276Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex64 PASSED [0.0031s] [ 29%] 2025-12-04T14:00:07.8927127Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float32 PASSED [0.0033s] [ 29%] 2025-12-04T14:00:07.8927971Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float64 PASSED [0.0031s] [ 29%] 2025-12-04T14:00:07.8928905Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int16 PASSED [0.0031s] [ 29%] 2025-12-04T14:00:07.8929726Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int32 PASSED [0.0033s] [ 29%] 2025-12-04T14:00:07.8930542Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int64 PASSED [0.0030s] [ 29%] 2025-12-04T14:00:07.8931422Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int8 PASSED [0.0029s] [ 29%] 2025-12-04T14:00:07.8932241Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_uint8 PASSED [0.0031s] [ 29%] 2025-12-04T14:00:07.8933085Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex128 PASSED [0.0030s] [ 29%] 2025-12-04T14:00:07.8933949Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex64 PASSED [0.0029s] [ 29%] 2025-12-04T14:00:07.8934800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float32 PASSED [0.0032s] [ 29%] 2025-12-04T14:00:07.8935641Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float64 PASSED [0.0029s] [ 29%] 2025-12-04T14:00:07.8936468Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int16 PASSED [0.0029s] [ 29%] 2025-12-04T14:00:07.8937297Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int32 PASSED [0.0031s] [ 29%] 2025-12-04T14:00:07.8938121Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int64 PASSED [0.0031s] [ 29%] 2025-12-04T14:00:07.8938942Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int8 PASSED [0.0032s] [ 29%] 2025-12-04T14:00:07.8939806Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_uint8 PASSED [0.0031s] [ 29%] 2025-12-04T14:00:07.8940661Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex128 PASSED [0.0034s] [ 29%] 2025-12-04T14:00:07.8941534Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex64 PASSED [0.0031s] [ 29%] 2025-12-04T14:00:07.8942398Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float32 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8943249Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float64 PASSED [0.0033s] [ 30%] 2025-12-04T14:00:07.8944100Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int16 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8944933Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int32 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8945815Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int64 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8946638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int8 PASSED [0.0032s] [ 30%] 2025-12-04T14:00:07.8947510Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_uint8 PASSED [0.0030s] [ 30%] 2025-12-04T14:00:07.8948362Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex128 PASSED [0.0030s] [ 30%] 2025-12-04T14:00:07.8949277Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex64 PASSED [0.0033s] [ 30%] 2025-12-04T14:00:07.8950127Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float32 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8950965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float64 PASSED [0.0031s] [ 30%] 2025-12-04T14:00:07.8951802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int16 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8952626Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int32 PASSED [0.0032s] [ 30%] 2025-12-04T14:00:07.8953449Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int64 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8954269Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int8 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8955134Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_uint8 PASSED [0.0032s] [ 30%] 2025-12-04T14:00:07.8955987Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex128 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8956899Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex64 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8957760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float32 PASSED [0.0034s] [ 30%] 2025-12-04T14:00:07.8958615Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float64 PASSED [0.0030s] [ 30%] 2025-12-04T14:00:07.8959456Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int16 PASSED [0.0030s] [ 30%] 2025-12-04T14:00:07.8960291Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int32 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8961132Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int64 PASSED [0.0032s] [ 30%] 2025-12-04T14:00:07.8961965Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int8 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8962794Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_uint8 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8963638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float32 PASSED [0.0030s] [ 30%] 2025-12-04T14:00:07.8964482Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float64 PASSED [0.0033s] [ 30%] 2025-12-04T14:00:07.8965319Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int16 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8966149Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int32 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8966980Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int64 PASSED [0.0029s] [ 30%] 2025-12-04T14:00:07.8967802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int8 PASSED [0.0032s] [ 30%] 2025-12-04T14:00:07.8968653Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_uint8 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8969523Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex128 PASSED [0.0031s] [ 31%] 2025-12-04T14:00:07.8970395Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex64 PASSED [0.0030s] [ 31%] 2025-12-04T14:00:07.8971243Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float32 PASSED [0.0032s] [ 31%] 2025-12-04T14:00:07.8972133Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float64 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8973005Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int16 PASSED [0.0028s] [ 31%] 2025-12-04T14:00:07.8973834Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int32 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8974663Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int64 PASSED [0.0032s] [ 31%] 2025-12-04T14:00:07.8975482Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int8 PASSED [0.0028s] [ 31%] 2025-12-04T14:00:07.8976301Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_uint8 PASSED [0.0028s] [ 31%] 2025-12-04T14:00:07.8977193Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex128 PASSED [0.0030s] [ 31%] 2025-12-04T14:00:07.8978141Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex64 PASSED [0.0033s] [ 31%] 2025-12-04T14:00:07.8979133Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float32 PASSED [0.0030s] [ 31%] 2025-12-04T14:00:07.8980053Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float64 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8981008Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int16 PASSED [0.0028s] [ 31%] 2025-12-04T14:00:07.8981915Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int32 PASSED [0.0031s] [ 31%] 2025-12-04T14:00:07.8982855Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int64 PASSED [0.0028s] [ 31%] 2025-12-04T14:00:07.8983744Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int8 PASSED [0.0028s] [ 31%] 2025-12-04T14:00:07.8984648Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_uint8 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8985532Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float32 PASSED [0.0033s] [ 31%] 2025-12-04T14:00:07.8986395Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float64 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8987257Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int16 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8988109Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int32 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8989018Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int64 PASSED [0.0033s] [ 31%] 2025-12-04T14:00:07.8989859Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int8 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8990707Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_uint8 PASSED [0.0030s] [ 31%] 2025-12-04T14:00:07.8991553Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float32 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8992399Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float64 PASSED [0.0033s] [ 31%] 2025-12-04T14:00:07.8993228Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int16 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8994047Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int32 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8994861Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int64 PASSED [0.0029s] [ 31%] 2025-12-04T14:00:07.8995677Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int8 PASSED [0.0032s] [ 32%] 2025-12-04T14:00:07.8996491Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_uint8 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.8997379Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float32 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.8998243Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float64 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.8999220Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int16 PASSED [0.0034s] [ 32%] 2025-12-04T14:00:07.9000066Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int32 PASSED [0.0030s] [ 32%] 2025-12-04T14:00:07.9000907Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int64 PASSED [0.0030s] [ 32%] 2025-12-04T14:00:07.9001755Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int8 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9007162Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_uint8 PASSED [0.0032s] [ 32%] 2025-12-04T14:00:07.9008367Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex128 PASSED [0.0030s] [ 32%] 2025-12-04T14:00:07.9009313Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex64 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9010186Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float32 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9011037Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float64 PASSED [0.0033s] [ 32%] 2025-12-04T14:00:07.9011989Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int16 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9012826Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int32 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9013721Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int64 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9014544Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int8 PASSED [0.0033s] [ 32%] 2025-12-04T14:00:07.9015377Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_uint8 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9016217Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float32 PASSED [0.0030s] [ 32%] 2025-12-04T14:00:07.9017074Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float64 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9017918Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int16 PASSED [0.0032s] [ 32%] 2025-12-04T14:00:07.9018802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int32 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9019695Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int64 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9020531Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int8 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9021360Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_uint8 PASSED [0.0032s] [ 32%] 2025-12-04T14:00:07.9022198Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float32 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9023040Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float64 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9023896Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex128 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9024767Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex64 PASSED [0.0033s] [ 32%] 2025-12-04T14:00:07.9025629Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float32 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9026478Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float64 PASSED [0.0029s] [ 32%] 2025-12-04T14:00:07.9027318Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int16 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9028217Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int32 PASSED [0.0031s] [ 33%] 2025-12-04T14:00:07.9028604Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int64 PASSED [0.0028s] [ 33%] 2025-12-04T14:00:07.9029032Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int8 PASSED [0.0028s] [ 33%] 2025-12-04T14:00:07.9029399Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_uint8 PASSED [0.0028s] [ 33%] 2025-12-04T14:00:07.9029776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex128 PASSED [0.0032s] [ 33%] 2025-12-04T14:00:07.9030157Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex64 PASSED [0.0028s] [ 33%] 2025-12-04T14:00:07.9030524Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float32 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9030885Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float64 PASSED [0.0028s] [ 33%] 2025-12-04T14:00:07.9031241Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int16 PASSED [0.0032s] [ 33%] 2025-12-04T14:00:07.9031601Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int32 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9031955Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int64 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9032355Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int8 PASSED [0.0028s] [ 33%] 2025-12-04T14:00:07.9032711Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_uint8 PASSED [0.0032s] [ 33%] 2025-12-04T14:00:07.9033137Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float32 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9033519Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float64 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9033888Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int16 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9034256Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int32 PASSED [0.0032s] [ 33%] 2025-12-04T14:00:07.9034623Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int64 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9034991Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int8 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9035362Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_uint8 PASSED [0.0028s] [ 33%] 2025-12-04T14:00:07.9035739Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float32 PASSED [0.0033s] [ 33%] 2025-12-04T14:00:07.9036115Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float64 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9036481Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int16 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9036848Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int32 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9037220Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int64 PASSED [0.0032s] [ 33%] 2025-12-04T14:00:07.9037586Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int8 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9037955Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_uint8 PASSED [0.0028s] [ 33%] 2025-12-04T14:00:07.9038333Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex128 PASSED [0.0030s] [ 33%] 2025-12-04T14:00:07.9038707Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex64 PASSED [0.0033s] [ 33%] 2025-12-04T14:00:07.9039072Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float32 PASSED [0.0029s] [ 33%] 2025-12-04T14:00:07.9039479Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float64 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9039880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int16 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9040236Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int32 PASSED [0.0032s] [ 34%] 2025-12-04T14:00:07.9040594Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int64 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9040946Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int8 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9041301Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_uint8 PASSED [0.0030s] [ 34%] 2025-12-04T14:00:07.9041686Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float32 PASSED [0.0033s] [ 34%] 2025-12-04T14:00:07.9042066Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float64 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9042441Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int16 PASSED [0.0028s] [ 34%] 2025-12-04T14:00:07.9042818Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int32 PASSED [0.0028s] [ 34%] 2025-12-04T14:00:07.9043230Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int64 PASSED [0.0032s] [ 34%] 2025-12-04T14:00:07.9043605Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int8 PASSED [0.0028s] [ 34%] 2025-12-04T14:00:07.9043977Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_uint8 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9044385Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex128 PASSED [0.0030s] [ 34%] 2025-12-04T14:00:07.9044751Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex64 PASSED [0.0033s] [ 34%] 2025-12-04T14:00:07.9045109Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float32 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9045465Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float64 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9045817Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int16 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9046165Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int32 PASSED [0.0032s] [ 34%] 2025-12-04T14:00:07.9046512Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int64 PASSED [0.0028s] [ 34%] 2025-12-04T14:00:07.9046856Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int8 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9047199Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_uint8 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9047622Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float32 PASSED [0.0033s] [ 34%] 2025-12-04T14:00:07.9048043Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float64 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9048462Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int16 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9048923Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int32 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9049333Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int64 PASSED [0.0032s] [ 34%] 2025-12-04T14:00:07.9049743Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int8 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9050151Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_uint8 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9050548Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex128 PASSED [0.0029s] [ 34%] 2025-12-04T14:00:07.9050979Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex64 PASSED [0.0033s] [ 34%] 2025-12-04T14:00:07.9051395Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float32 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9051778Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float64 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9052146Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int16 PASSED [0.0028s] [ 35%] 2025-12-04T14:00:07.9052514Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int32 PASSED [0.0032s] [ 35%] 2025-12-04T14:00:07.9052882Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int64 PASSED [0.0028s] [ 35%] 2025-12-04T14:00:07.9053249Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int8 PASSED [0.0028s] [ 35%] 2025-12-04T14:00:07.9053621Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_uint8 PASSED [0.0028s] [ 35%] 2025-12-04T14:00:07.9053992Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float32 PASSED [0.0034s] [ 35%] 2025-12-04T14:00:07.9054364Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float64 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9054766Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int16 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9055128Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int32 PASSED [0.0030s] [ 35%] 2025-12-04T14:00:07.9055561Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int64 PASSED [0.0033s] [ 35%] 2025-12-04T14:00:07.9055919Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int8 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9056287Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_uint8 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9056647Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float32 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9057011Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float64 PASSED [0.0033s] [ 35%] 2025-12-04T14:00:07.9057378Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int16 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9057731Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int32 PASSED [0.0028s] [ 35%] 2025-12-04T14:00:07.9058092Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int64 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9058443Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int8 PASSED [0.0032s] [ 35%] 2025-12-04T14:00:07.9058834Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_uint8 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9059288Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex128 PASSED [0.0030s] [ 35%] 2025-12-04T14:00:07.9059656Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex64 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9060014Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float32 PASSED [0.0033s] [ 35%] 2025-12-04T14:00:07.9060370Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float64 PASSED [0.0030s] [ 35%] 2025-12-04T14:00:07.9060714Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int16 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9061068Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int32 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9061417Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int64 PASSED [0.0032s] [ 35%] 2025-12-04T14:00:07.9061763Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int8 PASSED [0.0029s] [ 35%] 2025-12-04T14:00:07.9062160Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_uint8 PASSED [0.0028s] [ 35%] 2025-12-04T14:00:07.9062558Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float32 PASSED [0.0030s] [ 35%] 2025-12-04T14:00:07.9062923Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float64 PASSED [0.0033s] [ 36%] 2025-12-04T14:00:07.9063273Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int16 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9063622Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int32 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9063977Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int64 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9064320Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int8 PASSED [0.0032s] [ 36%] 2025-12-04T14:00:07.9064678Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_uint8 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9065052Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float32 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9065425Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float64 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9065836Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int16 PASSED [0.0032s] [ 36%] 2025-12-04T14:00:07.9066199Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int32 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9066602Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int64 PASSED [0.0028s] [ 36%] 2025-12-04T14:00:07.9066963Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int8 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9067325Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_uint8 PASSED [0.0032s] [ 36%] 2025-12-04T14:00:07.9067697Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex128 PASSED [0.0030s] [ 36%] 2025-12-04T14:00:07.9068061Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex64 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9068423Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float32 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9068776Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float64 PASSED [0.0033s] [ 36%] 2025-12-04T14:00:07.9069118Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int16 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9069469Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int32 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9069811Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int64 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9070157Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int8 PASSED [0.0033s] [ 36%] 2025-12-04T14:00:07.9070506Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_uint8 PASSED [0.0030s] [ 36%] 2025-12-04T14:00:07.9070877Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex128 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9071249Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex64 PASSED [0.0030s] [ 36%] 2025-12-04T14:00:07.9071606Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float32 PASSED [0.0034s] [ 36%] 2025-12-04T14:00:07.9071968Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float64 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9072322Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int16 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9072672Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int32 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9073066Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int64 PASSED [0.0033s] [ 36%] 2025-12-04T14:00:07.9073449Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int8 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9073800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_uint8 PASSED [0.0029s] [ 36%] 2025-12-04T14:00:07.9074176Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex128 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9074541Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex64 PASSED [0.0033s] [ 37%] 2025-12-04T14:00:07.9074907Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float32 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9075263Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float64 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9075613Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int16 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9075967Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int32 PASSED [0.0033s] [ 37%] 2025-12-04T14:00:07.9076315Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int64 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9076706Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int8 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9077055Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_uint8 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9077468Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex128 PASSED [0.0034s] [ 37%] 2025-12-04T14:00:07.9077833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex64 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9078187Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float32 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9078559Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float64 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9078943Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int16 PASSED [0.0033s] [ 37%] 2025-12-04T14:00:07.9079287Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int32 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9079636Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int64 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9079979Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int8 PASSED [0.0030s] [ 37%] 2025-12-04T14:00:07.9080329Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_uint8 PASSED [0.0033s] [ 37%] 2025-12-04T14:00:07.9080701Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex128 PASSED [0.0030s] [ 37%] 2025-12-04T14:00:07.9081066Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex64 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9081433Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float32 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9081795Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float64 PASSED [0.0033s] [ 37%] 2025-12-04T14:00:07.9082150Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int16 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9082497Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int32 PASSED [0.0030s] [ 37%] 2025-12-04T14:00:07.9082845Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int64 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9083195Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int8 PASSED [0.0033s] [ 37%] 2025-12-04T14:00:07.9083544Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_uint8 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9083950Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float32 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9084357Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float64 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9084714Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int16 PASSED [0.0033s] [ 37%] 2025-12-04T14:00:07.9085072Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int32 PASSED [0.0029s] [ 37%] 2025-12-04T14:00:07.9085425Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int64 PASSED [0.0029s] [ 38%] 2025-12-04T14:00:07.9085777Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int8 PASSED [0.0029s] [ 38%] 2025-12-04T14:00:07.9086134Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_uint8 PASSED [0.0033s] [ 38%] 2025-12-04T14:00:07.9086484Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex128 PASSED [0.1911s] [ 38%] 2025-12-04T14:00:07.9086840Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex64 PASSED [0.0067s] [ 38%] 2025-12-04T14:00:07.9087180Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float32 PASSED [0.1742s] [ 38%] 2025-12-04T14:00:07.9087520Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float64 PASSED [0.0065s] [ 38%] 2025-12-04T14:00:07.9087896Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int16 PASSED [0.1743s] [ 38%] 2025-12-04T14:00:07.9088230Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int32 PASSED [0.0063s] [ 38%] 2025-12-04T14:00:07.9088669Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int64 PASSED [0.1733s] [ 38%] 2025-12-04T14:00:07.9089040Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int8 PASSED [0.0064s] [ 38%] 2025-12-04T14:00:07.9089367Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_uint8 PASSED [0.1728s] [ 38%] 2025-12-04T14:00:07.9089729Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex128 PASSED [0.0066s] [ 38%] 2025-12-04T14:00:07.9090083Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex64 PASSED [0.1730s] [ 38%] 2025-12-04T14:00:07.9090430Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float32 PASSED [0.0065s] [ 38%] 2025-12-04T14:00:07.9090773Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float64 PASSED [0.1741s] [ 38%] 2025-12-04T14:00:07.9091107Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int16 PASSED [0.0066s] [ 38%] 2025-12-04T14:00:07.9091452Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int32 PASSED [0.1739s] [ 38%] 2025-12-04T14:00:07.9091783Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int64 PASSED [0.0066s] [ 38%] 2025-12-04T14:00:07.9092116Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int8 PASSED [0.1738s] [ 38%] 2025-12-04T14:00:07.9092454Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_uint8 PASSED [0.0066s] [ 38%] 2025-12-04T14:00:07.9092815Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex128 PASSED [0.1740s] [ 38%] 2025-12-04T14:00:07.9093176Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex64 PASSED [0.0067s] [ 38%] 2025-12-04T14:00:07.9093523Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float32 PASSED [0.1739s] [ 38%] 2025-12-04T14:00:07.9093870Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float64 PASSED [0.0066s] [ 38%] 2025-12-04T14:00:07.9094213Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int16 PASSED [0.1738s] [ 38%] 2025-12-04T14:00:07.9094550Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int32 PASSED [0.0065s] [ 38%] 2025-12-04T14:00:07.9094936Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int64 PASSED [0.1740s] [ 38%] 2025-12-04T14:00:07.9095312Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int8 PASSED [0.0066s] [ 38%] 2025-12-04T14:00:07.9095655Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_uint8 PASSED [0.1735s] [ 38%] 2025-12-04T14:00:07.9096018Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex128 PASSED [0.0067s] [ 38%] 2025-12-04T14:00:07.9096370Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex64 PASSED [0.1740s] [ 39%] 2025-12-04T14:00:07.9096719Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float32 PASSED [0.0067s] [ 39%] 2025-12-04T14:00:07.9097062Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float64 PASSED [0.1740s] [ 39%] 2025-12-04T14:00:07.9097395Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int16 PASSED [0.0065s] [ 39%] 2025-12-04T14:00:07.9097733Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int32 PASSED [0.1737s] [ 39%] 2025-12-04T14:00:07.9098067Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int64 PASSED [0.0066s] [ 39%] 2025-12-04T14:00:07.9098401Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int8 PASSED [0.1740s] [ 39%] 2025-12-04T14:00:07.9098803Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_uint8 PASSED [0.0066s] [ 39%] 2025-12-04T14:00:07.9099227Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex128 PASSED [0.1749s] [ 39%] 2025-12-04T14:00:07.9099734Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex64 PASSED [0.0066s] [ 39%] 2025-12-04T14:00:07.9100110Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float32 PASSED [0.1739s] [ 39%] 2025-12-04T14:00:07.9100481Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float64 PASSED [0.0066s] [ 39%] 2025-12-04T14:00:07.9100849Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int16 PASSED [0.1742s] [ 39%] 2025-12-04T14:00:07.9101214Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int32 PASSED [0.0066s] [ 39%] 2025-12-04T14:00:07.9101578Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int64 PASSED [0.1739s] [ 39%] 2025-12-04T14:00:07.9101938Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int8 PASSED [0.0066s] [ 39%] 2025-12-04T14:00:07.9102301Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_uint8 PASSED [0.1738s] [ 39%] 2025-12-04T14:00:07.9102675Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float32 PASSED [0.0066s] [ 39%] 2025-12-04T14:00:07.9103042Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float64 PASSED [0.1743s] [ 39%] 2025-12-04T14:00:07.9103404Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int16 PASSED [0.0064s] [ 39%] 2025-12-04T14:00:07.9103761Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int32 PASSED [0.1742s] [ 39%] 2025-12-04T14:00:07.9104121Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int64 PASSED [0.0063s] [ 39%] 2025-12-04T14:00:07.9104480Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int8 PASSED [0.1736s] [ 39%] 2025-12-04T14:00:07.9104837Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_uint8 PASSED [0.0064s] [ 39%] 2025-12-04T14:00:07.9105222Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex128 PASSED [0.1743s] [ 39%] 2025-12-04T14:00:07.9105601Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex64 PASSED [0.0066s] [ 39%] 2025-12-04T14:00:07.9105968Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float32 PASSED [0.1742s] [ 39%] 2025-12-04T14:00:07.9106383Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float64 PASSED [0.0065s] [ 39%] 2025-12-04T14:00:07.9106741Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int16 PASSED [0.1737s] [ 39%] 2025-12-04T14:00:07.9107142Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int32 PASSED [0.0063s] [ 39%] 2025-12-04T14:00:07.9107502Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int64 PASSED [0.1740s] [ 39%] 2025-12-04T14:00:07.9108015Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int8 PASSED [0.0063s] [ 40%] 2025-12-04T14:00:07.9108352Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_uint8 PASSED [0.1738s] [ 40%] 2025-12-04T14:00:07.9108801Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex128 PASSED [0.0066s] [ 40%] 2025-12-04T14:00:07.9109201Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex64 PASSED [0.1743s] [ 40%] 2025-12-04T14:00:07.9109585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float32 PASSED [0.0065s] [ 40%] 2025-12-04T14:00:07.9109969Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float64 PASSED [0.1738s] [ 40%] 2025-12-04T14:00:07.9110342Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int16 PASSED [0.0063s] [ 40%] 2025-12-04T14:00:07.9110785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int32 PASSED [0.1737s] [ 40%] 2025-12-04T14:00:07.9111159Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int64 PASSED [0.0063s] [ 40%] 2025-12-04T14:00:07.9111585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int8 PASSED [0.1741s] [ 40%] 2025-12-04T14:00:07.9111956Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_uint8 PASSED [0.0063s] [ 40%] 2025-12-04T14:00:07.9112321Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float32 PASSED [0.1745s] [ 40%] 2025-12-04T14:00:07.9112680Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float64 PASSED [0.0066s] [ 40%] 2025-12-04T14:00:07.9113026Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int16 PASSED [0.1743s] [ 40%] 2025-12-04T14:00:07.9113377Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int32 PASSED [0.0065s] [ 40%] 2025-12-04T14:00:07.9113722Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int64 PASSED [0.1735s] [ 40%] 2025-12-04T14:00:07.9114070Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int8 PASSED [0.0066s] [ 40%] 2025-12-04T14:00:07.9114417Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_uint8 PASSED [0.1738s] [ 40%] 2025-12-04T14:00:07.9114755Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float32 PASSED [0.0066s] [ 40%] 2025-12-04T14:00:07.9115099Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float64 PASSED [0.1742s] [ 40%] 2025-12-04T14:00:07.9115432Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int16 PASSED [0.0065s] [ 40%] 2025-12-04T14:00:07.9115764Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int32 PASSED [0.1745s] [ 40%] 2025-12-04T14:00:07.9116094Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int64 PASSED [0.0066s] [ 40%] 2025-12-04T14:00:07.9116421Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int8 PASSED [0.1742s] [ 40%] 2025-12-04T14:00:07.9116759Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_uint8 PASSED [0.0066s] [ 40%] 2025-12-04T14:00:07.9117108Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float32 PASSED [0.1740s] [ 40%] 2025-12-04T14:00:07.9117462Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float64 PASSED [0.0065s] [ 40%] 2025-12-04T14:00:07.9117864Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int16 PASSED [0.1745s] [ 40%] 2025-12-04T14:00:07.9118266Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int32 PASSED [0.0065s] [ 40%] 2025-12-04T14:00:07.9118619Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int64 PASSED [0.1749s] [ 40%] 2025-12-04T14:00:07.9119003Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int8 PASSED [0.0065s] [ 40%] 2025-12-04T14:00:07.9119348Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_uint8 PASSED [0.1746s] [ 41%] 2025-12-04T14:00:07.9119710Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex128 PASSED [0.0066s] [ 41%] 2025-12-04T14:00:07.9120065Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex64 PASSED [0.1745s] [ 41%] 2025-12-04T14:00:07.9120414Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float32 PASSED [0.0065s] [ 41%] 2025-12-04T14:00:07.9120762Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float64 PASSED [0.1746s] [ 41%] 2025-12-04T14:00:07.9121101Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int16 PASSED [0.0065s] [ 41%] 2025-12-04T14:00:07.9121482Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int32 PASSED [0.1741s] [ 41%] 2025-12-04T14:00:07.9121821Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int64 PASSED [0.0066s] [ 41%] 2025-12-04T14:00:07.9122158Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int8 PASSED [0.1746s] [ 41%] 2025-12-04T14:00:07.9122537Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_uint8 PASSED [0.0066s] [ 41%] 2025-12-04T14:00:07.9122883Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float32 PASSED [0.1747s] [ 41%] 2025-12-04T14:00:07.9123235Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float64 PASSED [0.0065s] [ 41%] 2025-12-04T14:00:07.9123574Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int16 PASSED [0.1750s] [ 41%] 2025-12-04T14:00:07.9123915Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int32 PASSED [0.0063s] [ 41%] 2025-12-04T14:00:07.9124254Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int64 PASSED [0.1745s] [ 41%] 2025-12-04T14:00:07.9124587Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int8 PASSED [0.0063s] [ 41%] 2025-12-04T14:00:07.9124927Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_uint8 PASSED [0.1739s] [ 41%] 2025-12-04T14:00:07.9125273Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float32 PASSED [0.0065s] [ 41%] 2025-12-04T14:00:07.9125618Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float64 PASSED [0.1747s] [ 41%] 2025-12-04T14:00:07.9125981Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex128 PASSED [0.0065s] [ 41%] 2025-12-04T14:00:07.9126341Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex64 PASSED [0.1742s] [ 41%] 2025-12-04T14:00:07.9126690Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float32 PASSED [0.0064s] [ 41%] 2025-12-04T14:00:07.9127037Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float64 PASSED [0.1748s] [ 41%] 2025-12-04T14:00:07.9127377Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int16 PASSED [0.0064s] [ 41%] 2025-12-04T14:00:07.9127718Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int32 PASSED [0.1746s] [ 41%] 2025-12-04T14:00:07.9128055Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int64 PASSED [0.0064s] [ 41%] 2025-12-04T14:00:07.9128390Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int8 PASSED [0.1746s] [ 41%] 2025-12-04T14:00:07.9128773Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_uint8 PASSED [0.0064s] [ 41%] 2025-12-04T14:00:07.9129177Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex128 PASSED [0.1745s] [ 41%] 2025-12-04T14:00:07.9129540Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex64 PASSED [0.0065s] [ 41%] 2025-12-04T14:00:07.9129887Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float32 PASSED [0.1749s] [ 41%] 2025-12-04T14:00:07.9130236Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float64 PASSED [0.0064s] [ 42%] 2025-12-04T14:00:07.9130577Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int16 PASSED [0.1743s] [ 42%] 2025-12-04T14:00:07.9130914Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int32 PASSED [0.0064s] [ 42%] 2025-12-04T14:00:07.9131253Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int64 PASSED [0.1749s] [ 42%] 2025-12-04T14:00:07.9131590Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int8 PASSED [0.0064s] [ 42%] 2025-12-04T14:00:07.9131934Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_uint8 PASSED [0.1748s] [ 42%] 2025-12-04T14:00:07.9132337Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float32 PASSED [0.0064s] [ 42%] 2025-12-04T14:00:07.9132700Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float64 PASSED [0.1747s] [ 42%] 2025-12-04T14:00:07.9133056Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int16 PASSED [0.0065s] [ 42%] 2025-12-04T14:00:07.9133447Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int32 PASSED [0.1747s] [ 42%] 2025-12-04T14:00:07.9133800Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int64 PASSED [0.0064s] [ 42%] 2025-12-04T14:00:07.9134149Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int8 PASSED [0.1745s] [ 42%] 2025-12-04T14:00:07.9134500Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_uint8 PASSED [0.0064s] [ 42%] 2025-12-04T14:00:07.9134863Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float32 PASSED [0.1751s] [ 42%] 2025-12-04T14:00:07.9135222Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float64 PASSED [0.0064s] [ 42%] 2025-12-04T14:00:07.9135573Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int16 PASSED [0.1748s] [ 42%] 2025-12-04T14:00:07.9135925Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int32 PASSED [0.0063s] [ 42%] 2025-12-04T14:00:07.9136274Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int64 PASSED [0.1747s] [ 42%] 2025-12-04T14:00:07.9136625Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int8 PASSED [0.0064s] [ 42%] 2025-12-04T14:00:07.9136976Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_uint8 PASSED [0.1748s] [ 42%] 2025-12-04T14:00:07.9137342Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex128 PASSED [0.0066s] [ 42%] 2025-12-04T14:00:07.9137697Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex64 PASSED [0.1751s] [ 42%] 2025-12-04T14:00:07.9138043Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float32 PASSED [0.0065s] [ 42%] 2025-12-04T14:00:07.9138391Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float64 PASSED [0.1752s] [ 42%] 2025-12-04T14:00:07.9138760Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int16 PASSED [0.0065s] [ 42%] 2025-12-04T14:00:07.9139167Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int32 PASSED [0.1750s] [ 42%] 2025-12-04T14:00:07.9139508Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int64 PASSED [0.0065s] [ 42%] 2025-12-04T14:00:07.9139892Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int8 PASSED [0.1750s] [ 42%] 2025-12-04T14:00:07.9140296Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_uint8 PASSED [0.0066s] [ 42%] 2025-12-04T14:00:07.9140670Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float32 PASSED [0.1749s] [ 42%] 2025-12-04T14:00:07.9141036Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float64 PASSED [0.0065s] [ 42%] 2025-12-04T14:00:07.9141396Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int16 PASSED [0.1747s] [ 43%] 2025-12-04T14:00:07.9141754Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int32 PASSED [0.0064s] [ 43%] 2025-12-04T14:00:07.9142113Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int64 PASSED [0.1749s] [ 43%] 2025-12-04T14:00:07.9142469Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int8 PASSED [0.0064s] [ 43%] 2025-12-04T14:00:07.9142826Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_uint8 PASSED [0.1752s] [ 43%] 2025-12-04T14:00:07.9143180Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex128 PASSED [0.0066s] [ 43%] 2025-12-04T14:00:07.9143570Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex64 PASSED [0.1752s] [ 43%] 2025-12-04T14:00:07.9143912Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float32 PASSED [0.0065s] [ 43%] 2025-12-04T14:00:07.9144290Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float64 PASSED [0.1752s] [ 43%] 2025-12-04T14:00:07.9144620Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int16 PASSED [0.0063s] [ 43%] 2025-12-04T14:00:07.9144960Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int32 PASSED [0.1750s] [ 43%] 2025-12-04T14:00:07.9145293Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int64 PASSED [0.0064s] [ 43%] 2025-12-04T14:00:07.9145633Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int8 PASSED [0.1752s] [ 43%] 2025-12-04T14:00:07.9145967Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_uint8 PASSED [0.0063s] [ 43%] 2025-12-04T14:00:07.9146375Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float32 PASSED [0.1754s] [ 43%] 2025-12-04T14:00:07.9146787Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float64 PASSED [0.0066s] [ 43%] 2025-12-04T14:00:07.9147186Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int16 PASSED [0.1752s] [ 43%] 2025-12-04T14:00:07.9147585Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int32 PASSED [0.0064s] [ 43%] 2025-12-04T14:00:07.9147985Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int64 PASSED [0.1751s] [ 43%] 2025-12-04T14:00:07.9148383Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int8 PASSED [0.0063s] [ 43%] 2025-12-04T14:00:07.9148780Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_uint8 PASSED [0.1751s] [ 43%] 2025-12-04T14:00:07.9149165Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex128 PASSED [0.0065s] [ 43%] 2025-12-04T14:00:07.9149546Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex64 PASSED [0.1753s] [ 43%] 2025-12-04T14:00:07.9149914Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float32 PASSED [0.0065s] [ 43%] 2025-12-04T14:00:07.9150274Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float64 PASSED [0.1750s] [ 43%] 2025-12-04T14:00:07.9150628Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int16 PASSED [0.0063s] [ 43%] 2025-12-04T14:00:07.9151026Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int32 PASSED [0.1755s] [ 43%] 2025-12-04T14:00:07.9151415Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int64 PASSED [0.0064s] [ 43%] 2025-12-04T14:00:07.9151777Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int8 PASSED [0.1754s] [ 43%] 2025-12-04T14:00:07.9152133Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_uint8 PASSED [0.0063s] [ 43%] 2025-12-04T14:00:07.9152491Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float32 PASSED [0.1749s] [ 43%] 2025-12-04T14:00:07.9152854Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float64 PASSED [0.0066s] [ 44%] 2025-12-04T14:00:07.9153201Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int16 PASSED [0.1754s] [ 44%] 2025-12-04T14:00:07.9153554Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int32 PASSED [0.0066s] [ 44%] 2025-12-04T14:00:07.9153899Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int64 PASSED [0.1754s] [ 44%] 2025-12-04T14:00:07.9154254Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int8 PASSED [0.0066s] [ 44%] 2025-12-04T14:00:07.9154642Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_uint8 PASSED [0.1757s] [ 44%] 2025-12-04T14:00:07.9154991Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float32 PASSED [0.0066s] [ 44%] 2025-12-04T14:00:07.9155386Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float64 PASSED [0.1755s] [ 44%] 2025-12-04T14:00:07.9155723Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int16 PASSED [0.0064s] [ 44%] 2025-12-04T14:00:07.9156073Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int32 PASSED [0.1752s] [ 44%] 2025-12-04T14:00:07.9156413Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int64 PASSED [0.0064s] [ 44%] 2025-12-04T14:00:07.9156752Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int8 PASSED [0.1751s] [ 44%] 2025-12-04T14:00:07.9157095Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_uint8 PASSED [0.0064s] [ 44%] 2025-12-04T14:00:07.9157454Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex128 PASSED [0.1757s] [ 44%] 2025-12-04T14:00:07.9157803Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex64 PASSED [0.0066s] [ 44%] 2025-12-04T14:00:07.9158146Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float32 PASSED [0.1754s] [ 44%] 2025-12-04T14:00:07.9158486Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float64 PASSED [0.0065s] [ 44%] 2025-12-04T14:00:07.9158820Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int16 PASSED [0.1753s] [ 44%] 2025-12-04T14:00:07.9159152Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int32 PASSED [0.0063s] [ 44%] 2025-12-04T14:00:07.9159489Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int64 PASSED [0.1755s] [ 44%] 2025-12-04T14:00:07.9159822Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int8 PASSED [0.0064s] [ 44%] 2025-12-04T14:00:07.9160151Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_uint8 PASSED [0.1748s] [ 44%] 2025-12-04T14:00:07.9160496Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float32 PASSED [0.0065s] [ 44%] 2025-12-04T14:00:07.9160842Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float64 PASSED [0.1756s] [ 44%] 2025-12-04T14:00:07.9161184Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int16 PASSED [0.0064s] [ 44%] 2025-12-04T14:00:07.9161524Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int32 PASSED [0.1755s] [ 44%] 2025-12-04T14:00:07.9161905Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int64 PASSED [0.0064s] [ 44%] 2025-12-04T14:00:07.9162279Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int8 PASSED [0.1757s] [ 44%] 2025-12-04T14:00:07.9162616Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_uint8 PASSED [0.0064s] [ 44%] 2025-12-04T14:00:07.9162976Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float32 PASSED [0.1756s] [ 44%] 2025-12-04T14:00:07.9163339Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float64 PASSED [0.0064s] [ 44%] 2025-12-04T14:00:07.9163691Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int16 PASSED [0.1757s] [ 45%] 2025-12-04T14:00:07.9164041Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int32 PASSED [0.0064s] [ 45%] 2025-12-04T14:00:07.9164390Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int64 PASSED [0.1749s] [ 45%] 2025-12-04T14:00:07.9164734Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int8 PASSED [0.0065s] [ 45%] 2025-12-04T14:00:07.9165086Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_uint8 PASSED [0.1755s] [ 45%] 2025-12-04T14:00:07.9165481Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex128 PASSED [0.0067s] [ 45%] 2025-12-04T14:00:07.9165833Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex64 PASSED [0.1753s] [ 45%] 2025-12-04T14:00:07.9166175Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float32 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9166552Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float64 PASSED [0.1759s] [ 45%] 2025-12-04T14:00:07.9166887Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int16 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9167219Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int32 PASSED [0.1757s] [ 45%] 2025-12-04T14:00:07.9167549Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int64 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9167877Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int8 PASSED [0.1752s] [ 45%] 2025-12-04T14:00:07.9168207Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_uint8 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9168578Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex128 PASSED [0.1755s] [ 45%] 2025-12-04T14:00:07.9168929Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex64 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9169276Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float32 PASSED [0.1756s] [ 45%] 2025-12-04T14:00:07.9169619Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float64 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9169955Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int16 PASSED [0.1759s] [ 45%] 2025-12-04T14:00:07.9170294Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int32 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9170626Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int64 PASSED [0.1759s] [ 45%] 2025-12-04T14:00:07.9170960Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int8 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9171296Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_uint8 PASSED [0.1760s] [ 45%] 2025-12-04T14:00:07.9171655Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex128 PASSED [0.0067s] [ 45%] 2025-12-04T14:00:07.9172010Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex64 PASSED [0.1759s] [ 45%] 2025-12-04T14:00:07.9172351Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float32 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9172740Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float64 PASSED [0.1759s] [ 45%] 2025-12-04T14:00:07.9173115Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int16 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9173448Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int32 PASSED [0.1756s] [ 45%] 2025-12-04T14:00:07.9173785Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int64 PASSED [0.0066s] [ 45%] 2025-12-04T14:00:07.9174115Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int8 PASSED [0.1761s] [ 45%] 2025-12-04T14:00:07.9174452Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_uint8 PASSED [0.0066s] [ 46%] 2025-12-04T14:00:07.9174804Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex128 PASSED [0.1762s] [ 46%] 2025-12-04T14:00:07.9175153Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex64 PASSED [0.0066s] [ 46%] 2025-12-04T14:00:07.9175504Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float32 PASSED [0.1755s] [ 46%] 2025-12-04T14:00:07.9175843Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float64 PASSED [0.0066s] [ 46%] 2025-12-04T14:00:07.9176172Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int16 PASSED [0.1760s] [ 46%] 2025-12-04T14:00:07.9176549Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int32 PASSED [0.0066s] [ 46%] 2025-12-04T14:00:07.9176880Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int64 PASSED [0.1759s] [ 46%] 2025-12-04T14:00:07.9177246Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int8 PASSED [0.0066s] [ 46%] 2025-12-04T14:00:07.9177582Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_uint8 PASSED [0.1759s] [ 46%] 2025-12-04T14:00:07.9177940Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex128 PASSED [0.0067s] [ 46%] 2025-12-04T14:00:07.9178295Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex64 PASSED [0.1761s] [ 46%] 2025-12-04T14:00:07.9178638Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float32 PASSED [0.0065s] [ 46%] 2025-12-04T14:00:07.9178983Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float64 PASSED [0.1761s] [ 46%] 2025-12-04T14:00:07.9179372Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int16 PASSED [0.0066s] [ 46%] 2025-12-04T14:00:07.9179706Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int32 PASSED [0.1761s] [ 46%] 2025-12-04T14:00:07.9180044Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int64 PASSED [0.0066s] [ 46%] 2025-12-04T14:00:07.9180376Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int8 PASSED [0.1763s] [ 46%] 2025-12-04T14:00:07.9180715Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_uint8 PASSED [0.0066s] [ 46%] 2025-12-04T14:00:07.9181071Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float32 PASSED [0.1764s] [ 46%] 2025-12-04T14:00:07.9181421Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float64 PASSED [0.0065s] [ 46%] 2025-12-04T14:00:07.9181777Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int16 PASSED [0.1761s] [ 46%] 2025-12-04T14:00:07.9182123Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int32 PASSED [0.0064s] [ 46%] 2025-12-04T14:00:07.9182462Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int64 PASSED [0.1762s] [ 46%] 2025-12-04T14:00:07.9182802Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int8 PASSED [0.0064s] [ 46%] 2025-12-04T14:00:07.9183143Z test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_uint8 PASSED [0.1757s] [ 46%] 2025-12-04T14:00:07.9183656Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_bfloat16 PASSED [0.0563s] [ 46%] 2025-12-04T14:00:07.9184119Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float16 PASSED [0.0347s] [ 46%] 2025-12-04T14:00:07.9184540Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float32 PASSED [0.0340s] [ 46%] 2025-12-04T14:00:07.9184966Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float64 PASSED [0.0339s] [ 46%] 2025-12-04T14:00:07.9185384Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int16 PASSED [0.0525s] [ 46%] 2025-12-04T14:00:07.9185810Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int32 PASSED [0.0260s] [ 47%] 2025-12-04T14:00:07.9186221Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int64 PASSED [0.0254s] [ 47%] 2025-12-04T14:00:07.9186632Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int8 PASSED [0.0258s] [ 47%] 2025-12-04T14:00:07.9187052Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_uint8 PASSED [0.0258s] [ 47%] 2025-12-04T14:00:07.9187477Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_bfloat16 PASSED [0.0341s] [ 47%] 2025-12-04T14:00:07.9187943Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float16 PASSED [0.0339s] [ 47%] 2025-12-04T14:00:07.9188368Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float32 PASSED [0.0339s] [ 47%] 2025-12-04T14:00:07.9188829Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float64 PASSED [0.0340s] [ 47%] 2025-12-04T14:00:07.9189242Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int16 PASSED [0.0256s] [ 47%] 2025-12-04T14:00:07.9189655Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int32 PASSED [0.0256s] [ 47%] 2025-12-04T14:00:07.9190073Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int64 PASSED [0.0253s] [ 47%] 2025-12-04T14:00:07.9190482Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int8 PASSED [0.0256s] [ 47%] 2025-12-04T14:00:07.9190895Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_uint8 PASSED [0.0253s] [ 47%] 2025-12-04T14:00:07.9191322Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bfloat16 PASSED [1.4614s] [ 47%] 2025-12-04T14:00:07.9191732Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bool PASSED [1.4099s] [ 47%] 2025-12-04T14:00:07.9192166Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex128 PASSED [3.7291s] [ 47%] 2025-12-04T14:00:07.9192600Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex64 PASSED [2.2238s] [ 47%] 2025-12-04T14:00:07.9193027Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float16 PASSED [1.4384s] [ 47%] 2025-12-04T14:00:07.9193456Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float32 PASSED [0.7592s] [ 47%] 2025-12-04T14:00:07.9193874Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float64 PASSED [0.7599s] [ 47%] 2025-12-04T14:00:07.9194285Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int16 PASSED [0.0261s] [ 47%] 2025-12-04T14:00:07.9194704Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int32 PASSED [0.0257s] [ 47%] 2025-12-04T14:00:07.9195118Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int64 PASSED [0.0252s] [ 47%] 2025-12-04T14:00:07.9195575Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int8 PASSED [0.0258s] [ 47%] 2025-12-04T14:00:07.9196027Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_uint8 PASSED [0.0257s] [ 47%] 2025-12-04T14:00:07.9196454Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bfloat16 PASSED [0.0423s] [ 47%] 2025-12-04T14:00:07.9196866Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bool PASSED [0.0304s] [ 47%] 2025-12-04T14:00:07.9197300Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex128 PASSED [0.0302s] [ 47%] 2025-12-04T14:00:07.9197729Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex64 PASSED [0.0302s] [ 47%] 2025-12-04T14:00:07.9198146Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float16 PASSED [0.0372s] [ 47%] 2025-12-04T14:00:07.9198563Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float32 PASSED [0.0372s] [ 47%] 2025-12-04T14:00:07.9198985Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float64 PASSED [0.0374s] [ 48%] 2025-12-04T14:00:07.9199434Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int16 PASSED [0.0287s] [ 48%] 2025-12-04T14:00:07.9199843Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int32 PASSED [0.0285s] [ 48%] 2025-12-04T14:00:07.9200249Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int64 PASSED [0.0281s] [ 48%] 2025-12-04T14:00:07.9200693Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int8 PASSED [0.0285s] [ 48%] 2025-12-04T14:00:07.9201104Z test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_uint8 PASSED [0.0288s] [ 48%] 2025-12-04T14:00:07.9201414Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_bfloat16 PASSED [0.0428s] [ 48%] 2025-12-04T14:00:07.9201731Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_complex128 PASSED [0.0131s] [ 48%] 2025-12-04T14:00:07.9202031Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_float64 PASSED [0.0127s] [ 48%] 2025-12-04T14:00:07.9202504Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_complex128 SKIPPED [0.0002s] (multi-GPU not supported) [ 48%] 2025-12-04T14:00:07.9202958Z test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_float64 SKIPPED [0.0002s] (multi-GPU not supported) [ 48%] 2025-12-04T14:00:07.9203293Z test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_complex128 PASSED [0.0023s] [ 48%] 2025-12-04T14:00:07.9203614Z test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_float64 PASSED [0.0025s] [ 48%] 2025-12-04T14:00:07.9203916Z test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_complex128 PASSED [0.0029s] [ 48%] 2025-12-04T14:00:07.9204201Z test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_float64 PASSED [0.0028s] [ 48%] 2025-12-04T14:00:07.9204475Z test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_complex128 PASSED [0.0026s] [ 48%] 2025-12-04T14:00:07.9204729Z test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_float64 PASSED [0.0025s] [ 48%] 2025-12-04T14:00:07.9204993Z test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_complex128 PASSED [0.0161s] [ 48%] 2025-12-04T14:00:07.9205243Z test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_float64 PASSED [0.0159s] [ 48%] 2025-12-04T14:00:07.9205480Z test_sparse.py::TestSparseCUDA::test_any_cuda PASSED [0.0023s] [ 48%] 2025-12-04T14:00:07.9205742Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float32 PASSED [0.0131s] [ 48%] 2025-12-04T14:00:07.9206000Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float64 PASSED [0.0129s] [ 48%] 2025-12-04T14:00:07.9206294Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int16 PASSED [0.0102s] [ 48%] 2025-12-04T14:00:07.9206548Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int32 PASSED [0.0102s] [ 48%] 2025-12-04T14:00:07.9206834Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int64 PASSED [0.0108s] [ 48%] 2025-12-04T14:00:07.9207084Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int8 PASSED [0.0101s] [ 48%] 2025-12-04T14:00:07.9207330Z test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_uint8 PASSED [0.0101s] [ 48%] 2025-12-04T14:00:07.9207567Z test_sparse.py::TestSparseCUDA::test_assign_cuda_float64 PASSED [0.0030s] [ 48%] 2025-12-04T14:00:07.9208070Z test_sparse.py::TestSparseCUDA::test_basic_cuda_complex128 PASSED [0.0151s] [ 48%] 2025-12-04T14:00:07.9208400Z test_sparse.py::TestSparseCUDA::test_basic_cuda_float64 PASSED [0.0144s] [ 48%] 2025-12-04T14:00:07.9208681Z test_sparse.py::TestSparseCUDA::test_basic_ops_cuda_float64 PASSED [0.3694s] [ 48%] 2025-12-04T14:00:07.9208923Z test_sparse.py::TestSparseCUDA::test_bmm_cuda_float64 PASSED [0.2602s] [ 49%] 2025-12-04T14:00:07.9209206Z test_sparse.py::TestSparseCUDA::test_bmm_deterministic_cuda_float64 PASSED [0.1863s] [ 49%] 2025-12-04T14:00:07.9209444Z test_sparse.py::TestSparseCUDA::test_bmm_oob_cuda PASSED [0.0367s] [ 49%] 2025-12-04T14:00:07.9210183Z test_sparse.py::TestSparseCUDA::test_bmm_windows_error_cuda_float64 SKIPPED [0.0003s] (this test ensures bmm sparse-dense CUDA gives an error when run on Windows with CUDA < 11.0) [ 49%] 2025-12-04T14:00:07.9210434Z test_sparse.py::TestSparseCUDA::test_cat_cuda_complex128 PASSED [0.0381s] [ 49%] 2025-12-04T14:00:07.9210667Z test_sparse.py::TestSparseCUDA::test_cat_cuda_float64 PASSED [0.0367s] [ 49%] 2025-12-04T14:00:07.9211041Z test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_complex128 PASSED [0.0040s] [ 49%] 2025-12-04T14:00:07.9211355Z test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_float64 PASSED [0.0031s] [ 49%] 2025-12-04T14:00:07.9211605Z test_sparse.py::TestSparseCUDA::test_clone_cuda_complex128 PASSED [0.0075s] [ 49%] 2025-12-04T14:00:07.9211839Z test_sparse.py::TestSparseCUDA::test_clone_cuda_float64 PASSED [0.0074s] [ 49%] 2025-12-04T14:00:07.9212305Z test_sparse.py::TestSparseCUDA::test_coalesce_accepts_large_tensor_cuda_float32 SKIPPED [0.1715s] (Insufficient cuda memory) [ 49%] 2025-12-04T14:00:07.9212559Z test_sparse.py::TestSparseCUDA::test_coalesce_cuda_bfloat16 PASSED [0.0205s] [ 49%] 2025-12-04T14:00:07.9212823Z test_sparse.py::TestSparseCUDA::test_coalesce_cuda_complex128 PASSED [0.0189s] [ 49%] 2025-12-04T14:00:07.9213070Z test_sparse.py::TestSparseCUDA::test_coalesce_cuda_float64 PASSED [0.0178s] [ 49%] 2025-12-04T14:00:07.9213389Z test_sparse.py::TestSparseCUDA::test_coalesce_reference_cycle_cuda_float64 PASSED [0.0022s] [ 49%] 2025-12-04T14:00:07.9213780Z test_sparse.py::TestSparseCUDA::test_coalesce_transpose_mm_cuda_float64 SKIPPED [0.0013s] (Only runs on cpu) [ 49%] 2025-12-04T14:00:07.9214037Z test_sparse.py::TestSparseCUDA::test_contig_cuda_complex128 PASSED [0.0056s] [ 49%] 2025-12-04T14:00:07.9214280Z test_sparse.py::TestSparseCUDA::test_contig_cuda_float64 PASSED [0.0055s] [ 49%] 2025-12-04T14:00:07.9214563Z test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_complex128 PASSED [0.0055s] [ 49%] 2025-12-04T14:00:07.9214823Z test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_float64 PASSED [0.0053s] [ 49%] 2025-12-04T14:00:07.9215177Z test_sparse.py::TestSparseCUDA::test_ctor_is_coalesced_with_gradcheck_cuda_float64 PASSED [0.2881s] [ 49%] 2025-12-04T14:00:07.9215457Z test_sparse.py::TestSparseCUDA::test_ctor_large_sizes_cuda_float64 PASSED [0.0021s] [ 49%] 2025-12-04T14:00:07.9215745Z test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_complex128 PASSED [0.0018s] [ 49%] 2025-12-04T14:00:07.9216019Z test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_float64 PASSED [0.0019s] [ 49%] 2025-12-04T14:00:07.9216254Z test_sparse.py::TestSparseCUDA::test_cuda_empty_cuda PASSED [0.0022s] [ 49%] 2025-12-04T14:00:07.9216596Z test_sparse.py::TestSparseCUDA::test_div_by_sparse_error_cuda PASSED [0.0019s] [ 49%] 2025-12-04T14:00:07.9216874Z test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float32 PASSED [0.0117s] [ 49%] 2025-12-04T14:00:07.9217228Z test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float64 PASSED [0.0114s] [ 49%] 2025-12-04T14:00:07.9217476Z test_sparse.py::TestSparseCUDA::test_dsmm_cuda_float64 PASSED [0.0622s] [ 49%] 2025-12-04T14:00:07.9217765Z test_sparse.py::TestSparseCUDA::test_dtypes_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 49%] 2025-12-04T14:00:07.9218190Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bfloat16 SKIPPED [0.0017s] (Only runs on cpu) [ 49%] 2025-12-04T14:00:07.9218623Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bool SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9219137Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9219568Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9219983Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float16 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9220445Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9220856Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9221259Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int16 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9221739Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9222141Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9222548Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int8 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9222953Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_uint8 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9223367Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bfloat16 SKIPPED [0.0016s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9223767Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bool SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9224193Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9224611Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9225018Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float16 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9225424Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9225833Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float64 SKIPPED [0.0017s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9226232Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int16 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9226636Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9227035Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9227427Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int8 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9227877Z test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_uint8 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9228183Z test_sparse.py::TestSparseCUDA::test_empty_like_cuda_complex128 PASSED [0.0099s] [ 50%] 2025-12-04T14:00:07.9228448Z test_sparse.py::TestSparseCUDA::test_empty_like_cuda_float64 PASSED [0.0086s] [ 50%] 2025-12-04T14:00:07.9228762Z test_sparse.py::TestSparseCUDA::test_factory_copy_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9229098Z test_sparse.py::TestSparseCUDA::test_factory_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9229434Z test_sparse.py::TestSparseCUDA::test_factory_cuda_complex64 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9229753Z test_sparse.py::TestSparseCUDA::test_factory_cuda_float16 SKIPPED [0.0013s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9230074Z test_sparse.py::TestSparseCUDA::test_factory_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9230395Z test_sparse.py::TestSparseCUDA::test_factory_cuda_float64 SKIPPED [0.0017s] (Only runs on cpu) [ 50%] 2025-12-04T14:00:07.9230694Z test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_complex128 PASSED [0.0022s] [ 51%] 2025-12-04T14:00:07.9230979Z test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_float64 PASSED [0.0020s] [ 51%] 2025-12-04T14:00:07.9231328Z test_sparse.py::TestSparseCUDA::test_factory_device_type_inference_cuda PASSED [0.0047s] [ 51%] 2025-12-04T14:00:07.9231598Z test_sparse.py::TestSparseCUDA::test_factory_empty_indices_cuda PASSED [0.0019s] [ 51%] 2025-12-04T14:00:07.9231867Z test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_complex128 PASSED [0.0021s] [ 51%] 2025-12-04T14:00:07.9232167Z test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_float64 PASSED [0.0025s] [ 51%] 2025-12-04T14:00:07.9232460Z test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_complex128 PASSED [0.0037s] [ 51%] 2025-12-04T14:00:07.9232738Z test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_float64 PASSED [0.0035s] [ 51%] 2025-12-04T14:00:07.9233038Z test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_complex128 PASSED [0.0029s] [ 51%] 2025-12-04T14:00:07.9233328Z test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_float64 PASSED [0.0027s] [ 51%] 2025-12-04T14:00:07.9233733Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex128 SKIPPED [0.0013s] (Only runs on cpu) [ 51%] 2025-12-04T14:00:07.9234136Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex64 SKIPPED [0.0013s] (Only runs on cpu) [ 51%] 2025-12-04T14:00:07.9234524Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float16 SKIPPED [0.0017s] (Only runs on cpu) [ 51%] 2025-12-04T14:00:07.9234911Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 51%] 2025-12-04T14:00:07.9235303Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 51%] 2025-12-04T14:00:07.9235683Z test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 51%] 2025-12-04T14:00:07.9235987Z test_sparse.py::TestSparseCUDA::test_floor_divide_by_sparse_error_cuda PASSED [0.0018s] [ 51%] 2025-12-04T14:00:07.9236280Z test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_complex128 PASSED [0.0429s] [ 51%] 2025-12-04T14:00:07.9236566Z test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_float64 PASSED [0.0317s] [ 51%] 2025-12-04T14:00:07.9236807Z test_sparse.py::TestSparseCUDA::test_hsmm_cuda_float64 PASSED [0.0214s] [ 51%] 2025-12-04T14:00:07.9237083Z test_sparse.py::TestSparseCUDA::test_index_select_cuda_complex128 PASSED [0.1254s] [ 51%] 2025-12-04T14:00:07.9237345Z test_sparse.py::TestSparseCUDA::test_index_select_cuda_float64 PASSED [0.1184s] [ 51%] 2025-12-04T14:00:07.9237744Z test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_complex128 PASSED [0.0114s] [ 51%] 2025-12-04T14:00:07.9238181Z test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_float64 PASSED [0.0108s] [ 51%] 2025-12-04T14:00:07.9238594Z test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_complex128 PASSED [0.1502s] [ 51%] 2025-12-04T14:00:07.9238994Z test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_float64 PASSED [0.1327s] [ 51%] 2025-12-04T14:00:07.9239381Z test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_complex128 PASSED [0.6841s] [ 51%] 2025-12-04T14:00:07.9239751Z test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_float64 PASSED [0.6462s] [ 51%] 2025-12-04T14:00:07.9240189Z test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_complex128 SKIPPED [0.0015s] (Only runs on cpu) [ 51%] 2025-12-04T14:00:07.9240610Z test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_float64 SKIPPED [0.0013s] (Only runs on cpu) [ 51%] 2025-12-04T14:00:07.9240852Z test_sparse.py::TestSparseCUDA::test_is_nonzero_cuda PASSED [0.0034s] [ 51%] 2025-12-04T14:00:07.9241093Z test_sparse.py::TestSparseCUDA::test_is_sparse_cuda PASSED [0.0014s] [ 52%] 2025-12-04T14:00:07.9241331Z test_sparse.py::TestSparseCUDA::test_isnan_cuda PASSED [0.0036s] [ 52%] 2025-12-04T14:00:07.9241568Z test_sparse.py::TestSparseCUDA::test_legacy_new_cuda PASSED [0.0019s] [ 52%] 2025-12-04T14:00:07.9241951Z test_sparse.py::TestSparseCUDA::test_legacy_new_device_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 52%] 2025-12-04T14:00:07.9242187Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_float32 PASSED [0.0053s] [ 52%] 2025-12-04T14:00:07.9242465Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_float64 PASSED [0.0051s] [ 52%] 2025-12-04T14:00:07.9242708Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_int16 PASSED [0.0046s] [ 52%] 2025-12-04T14:00:07.9242939Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_int32 PASSED [0.0050s] [ 52%] 2025-12-04T14:00:07.9243179Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_int64 PASSED [0.0045s] [ 52%] 2025-12-04T14:00:07.9243410Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_int8 PASSED [0.0045s] [ 52%] 2025-12-04T14:00:07.9243644Z test_sparse.py::TestSparseCUDA::test_log1p_cuda_uint8 PASSED [0.0045s] [ 52%] 2025-12-04T14:00:07.9243930Z test_sparse.py::TestSparseCUDA::test_log_softmax_float_cuda_float32 PASSED [0.0056s] [ 52%] 2025-12-04T14:00:07.9244224Z test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float32 PASSED [0.0037s] [ 52%] 2025-12-04T14:00:07.9244513Z test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float64 PASSED [0.0064s] [ 52%] 2025-12-04T14:00:07.9244769Z test_sparse.py::TestSparseCUDA::test_mm_cuda_complex128 PASSED [0.1149s] [ 52%] 2025-12-04T14:00:07.9245003Z test_sparse.py::TestSparseCUDA::test_mm_cuda_float64 PASSED [0.0382s] [ 52%] 2025-12-04T14:00:07.9245239Z test_sparse.py::TestSparseCUDA::test_mv_cuda_float64 PASSED [0.0336s] [ 52%] 2025-12-04T14:00:07.9245497Z test_sparse.py::TestSparseCUDA::test_narrow_cuda_complex128 PASSED [0.0559s] [ 52%] 2025-12-04T14:00:07.9245735Z test_sparse.py::TestSparseCUDA::test_narrow_cuda_float64 PASSED [0.0529s] [ 52%] 2025-12-04T14:00:07.9246019Z test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_complex128 PASSED [0.0143s] [ 52%] 2025-12-04T14:00:07.9246278Z test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_float64 PASSED [0.0134s] [ 52%] 2025-12-04T14:00:07.9246527Z test_sparse.py::TestSparseCUDA::test_negative_indices_cuda PASSED [0.0017s] [ 52%] 2025-12-04T14:00:07.9246773Z test_sparse.py::TestSparseCUDA::test_new_cuda_complex128 PASSED [0.0051s] [ 52%] 2025-12-04T14:00:07.9247009Z test_sparse.py::TestSparseCUDA::test_new_cuda_float64 PASSED [0.0049s] [ 52%] 2025-12-04T14:00:07.9247380Z test_sparse.py::TestSparseCUDA::test_new_device_multi_gpu_cuda SKIPPED [0.0002s] (only one GPU detected) [ 52%] 2025-12-04T14:00:07.9247641Z test_sparse.py::TestSparseCUDA::test_new_device_single_gpu_cuda PASSED [0.0019s] [ 52%] 2025-12-04T14:00:07.9247932Z test_sparse.py::TestSparseCUDA::test_norm_cuda_complex128 PASSED [0.0321s] [ 52%] 2025-12-04T14:00:07.9248170Z test_sparse.py::TestSparseCUDA::test_norm_cuda_float64 PASSED [0.0119s] [ 52%] 2025-12-04T14:00:07.9248494Z test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_complex128 PASSED [3.5504s] [ 52%] 2025-12-04T14:00:07.9248780Z test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_float64 PASSED [1.5578s] [ 52%] 2025-12-04T14:00:07.9249107Z test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_complex128 PASSED [3.9389s] [ 52%] 2025-12-04T14:00:07.9249374Z test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_float64 PASSED [1.5074s] [ 53%] 2025-12-04T14:00:07.9249623Z test_sparse.py::TestSparseCUDA::test_pickle_cuda_float64 PASSED [0.0168s] [ 53%] 2025-12-04T14:00:07.9249896Z test_sparse.py::TestSparseCUDA::test_print_coalesced_cuda_float64 PASSED [0.0153s] [ 53%] 2025-12-04T14:00:07.9250180Z test_sparse.py::TestSparseCUDA::test_print_uncoalesced_cuda_float64 PASSED [0.0146s] [ 53%] 2025-12-04T14:00:07.9250423Z test_sparse.py::TestSparseCUDA::test_resize_as_cuda PASSED [0.0022s] [ 53%] 2025-12-04T14:00:07.9250676Z test_sparse.py::TestSparseCUDA::test_resize_cuda_complex128 PASSED [0.0093s] [ 53%] 2025-12-04T14:00:07.9250919Z test_sparse.py::TestSparseCUDA::test_resize_cuda_float64 PASSED [0.0087s] [ 53%] 2025-12-04T14:00:07.9251290Z test_sparse.py::TestSparseCUDA::test_saddmm_cuda_complex128 SKIPPED [0.0013s] (Only runs on cpu) [ 53%] 2025-12-04T14:00:07.9251608Z test_sparse.py::TestSparseCUDA::test_saddmm_cuda_float64 SKIPPED [0.0018s] (Only runs on cpu) [ 53%] 2025-12-04T14:00:07.9252003Z test_sparse.py::TestSparseCUDA::test_same_gpu_cuda SKIPPED [0.0012s] (fewer than 2 devices detected) [ 53%] 2025-12-04T14:00:07.9252255Z test_sparse.py::TestSparseCUDA::test_scalar_cuda_complex128 PASSED [0.0051s] [ 53%] 2025-12-04T14:00:07.9252501Z test_sparse.py::TestSparseCUDA::test_scalar_cuda_float64 PASSED [0.0047s] [ 53%] 2025-12-04T14:00:07.9252753Z test_sparse.py::TestSparseCUDA::test_select_cuda_complex128 PASSED [0.1155s] [ 53%] 2025-12-04T14:00:07.9252996Z test_sparse.py::TestSparseCUDA::test_select_cuda_float64 PASSED [0.1106s] [ 53%] 2025-12-04T14:00:07.9253302Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int16 PASSED [0.0031s] [ 53%] 2025-12-04T14:00:07.9253608Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int32 PASSED [0.0022s] [ 53%] 2025-12-04T14:00:07.9253910Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int64 PASSED [0.0021s] [ 53%] 2025-12-04T14:00:07.9254204Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int8 PASSED [0.0021s] [ 53%] 2025-12-04T14:00:07.9254501Z test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_uint8 PASSED [0.0021s] [ 53%] 2025-12-04T14:00:07.9254765Z test_sparse.py::TestSparseCUDA::test_shared_cuda_complex128 PASSED [0.0028s] [ 53%] 2025-12-04T14:00:07.9255004Z test_sparse.py::TestSparseCUDA::test_shared_cuda_float64 PASSED [0.0026s] [ 53%] 2025-12-04T14:00:07.9255264Z test_sparse.py::TestSparseCUDA::test_small_nnz_coalesced_cuda PASSED [0.0021s] [ 53%] 2025-12-04T14:00:07.9255511Z test_sparse.py::TestSparseCUDA::test_softmax_cuda_float64 PASSED [1.4314s] [ 53%] 2025-12-04T14:00:07.9255785Z test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float32 PASSED [0.0034s] [ 53%] 2025-12-04T14:00:07.9256069Z test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float64 PASSED [0.0057s] [ 53%] 2025-12-04T14:00:07.9256306Z test_sparse.py::TestSparseCUDA::test_spadd_cuda_float64 PASSED [0.1069s] [ 53%] 2025-12-04T14:00:07.9256614Z test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex128 PASSED [0.0025s] [ 53%] 2025-12-04T14:00:07.9256923Z test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex64 PASSED [0.0029s] [ 53%] 2025-12-04T14:00:07.9257213Z test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float32 PASSED [0.0023s] [ 53%] 2025-12-04T14:00:07.9257554Z test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float64 PASSED [0.0023s] [ 53%] 2025-12-04T14:00:07.9257856Z test_sparse.py::TestSparseCUDA::test_sparse_add_out_bfloat16_cuda_float32 PASSED [0.0053s] [ 53%] 2025-12-04T14:00:07.9258412Z test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_bfloat16 SKIPPED [0.0013s] (addmm_sparse_cuda is not implemented for BFloat16 and Half) [ 54%] 2025-12-04T14:00:07.9258748Z test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_complex128 PASSED [7.2898s] [ 54%] 2025-12-04T14:00:07.9259309Z test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float16 SKIPPED [0.0016s] (addmm_sparse_cuda is not implemented for BFloat16 and Half) [ 54%] 2025-12-04T14:00:07.9259578Z test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float64 PASSED [2.5909s] [ 54%] 2025-12-04T14:00:07.9259845Z test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_complex128 PASSED [0.0021s] [ 54%] 2025-12-04T14:00:07.9260100Z test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_float64 PASSED [0.0018s] [ 54%] 2025-12-04T14:00:07.9260412Z test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_complex128 PASSED [0.0145s] [ 54%] 2025-12-04T14:00:07.9260707Z test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_float64 PASSED [0.0141s] [ 54%] 2025-12-04T14:00:07.9261314Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bfloat16 SKIPPED [0.0944s] (Test with dtype=torch.bfloat16, device=cuda:0 runs only with coalesced inputs) [ 54%] 2025-12-04T14:00:07.9261948Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bool SKIPPED [0.0058s] (Test with dtype=torch.bool, device=cuda:0 runs only with coalesced inputs) [ 54%] 2025-12-04T14:00:07.9262288Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex128 PASSED [0.2186s] [ 54%] 2025-12-04T14:00:07.9262579Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex64 PASSED [0.2173s] [ 54%] 2025-12-04T14:00:07.9263167Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float16 SKIPPED [0.0943s] (Test with dtype=torch.float16, device=cuda:0 runs only with coalesced inputs) [ 54%] 2025-12-04T14:00:07.9263448Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float32 PASSED [0.2047s] [ 54%] 2025-12-04T14:00:07.9263728Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float64 PASSED [0.2066s] [ 54%] 2025-12-04T14:00:07.9263992Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int16 PASSED [0.1558s] [ 54%] 2025-12-04T14:00:07.9264259Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int32 PASSED [0.1561s] [ 54%] 2025-12-04T14:00:07.9264524Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int64 PASSED [0.1554s] [ 54%] 2025-12-04T14:00:07.9264786Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int8 PASSED [0.1563s] [ 54%] 2025-12-04T14:00:07.9265050Z test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_uint8 PASSED [0.1555s] [ 54%] 2025-12-04T14:00:07.9265357Z test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_complex128 PASSED [3.7690s] [ 54%] 2025-12-04T14:00:07.9265657Z test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_float64 PASSED [1.4600s] [ 54%] 2025-12-04T14:00:07.9265922Z test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_complex128 PASSED [0.0578s] [ 54%] 2025-12-04T14:00:07.9266180Z test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_float64 PASSED [0.0548s] [ 54%] 2025-12-04T14:00:07.9266487Z test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_complex128 PASSED [0.0629s] [ 54%] 2025-12-04T14:00:07.9266774Z test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_float64 PASSED [0.0613s] [ 54%] 2025-12-04T14:00:07.9267044Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_bfloat16 PASSED [0.8084s] [ 54%] 2025-12-04T14:00:07.9267329Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex128 PASSED [46.8235s] [ 54%] 2025-12-04T14:00:07.9267600Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex64 PASSED [0.6948s] [ 54%] 2025-12-04T14:00:07.9267864Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float16 PASSED [0.8018s] [ 54%] 2025-12-04T14:00:07.9268169Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float32 PASSED [0.7047s] [ 54%] 2025-12-04T14:00:07.9268485Z test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float64 PASSED [18.6624s] [ 55%] 2025-12-04T14:00:07.9268766Z test_sparse.py::TestSparseCUDA::test_sparse_mm_cuda_float64 PASSED [0.7153s] [ 55%] 2025-12-04T14:00:07.9269152Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.2343s] [ 55%] 2025-12-04T14:00:07.9269514Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.4105s] [ 55%] 2025-12-04T14:00:07.9269794Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 FAILED [0.3886s] [ 55%] 2025-12-04T14:00:07.9269801Z 2025-12-04T14:00:07.9273680Z ==================================== RERUNS ==================================== 2025-12-04T14:00:07.9273936Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9274048Z Traceback (most recent call last): 2025-12-04T14:00:07.9274346Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9274443Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9274701Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9274998Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9275239Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9275372Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9275884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9276043Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9276446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9276551Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9276983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9277076Z _gradcheck_real_imag( 2025-12-04T14:00:07.9277523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9277603Z gradcheck_fn( 2025-12-04T14:00:07.9278024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9278121Z raise GradcheckError( 2025-12-04T14:00:07.9278482Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9278601Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9278685Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9278771Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9278842Z ..., 2025-12-04T14:00:07.9278920Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9279006Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9279152Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9279353Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9279490Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9279613Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9279685Z ..., 2025-12-04T14:00:07.9279809Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9279928Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9280089Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9280202Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9280257Z 2025-12-04T14:00:07.9280261Z 2025-12-04T14:00:07.9280446Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9281018Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9281024Z 2025-12-04T14:00:07.9281253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9281478Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9281579Z Traceback (most recent call last): 2025-12-04T14:00:07.9281859Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9281955Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9282209Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9282349Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9282585Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9282722Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9283175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9283332Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9283771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9283874Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9284348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9284439Z _gradcheck_real_imag( 2025-12-04T14:00:07.9284886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9284964Z gradcheck_fn( 2025-12-04T14:00:07.9285387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9285475Z raise GradcheckError( 2025-12-04T14:00:07.9285837Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9285956Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9286039Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9286128Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9286198Z ..., 2025-12-04T14:00:07.9286277Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9286363Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9286501Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9286698Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9286831Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9286955Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9287030Z ..., 2025-12-04T14:00:07.9287155Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9287275Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9287407Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9287515Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9287520Z 2025-12-04T14:00:07.9287524Z 2025-12-04T14:00:07.9287706Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9288234Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9288239Z 2025-12-04T14:00:07.9288458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9288650Z =================================== FAILURES =================================== 2025-12-04T14:00:07.9288893Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9289035Z Traceback (most recent call last): 2025-12-04T14:00:07.9289317Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9289404Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9289659Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9289791Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9290025Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9290157Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9290601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9290758Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9291150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9291251Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9291686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9291844Z _gradcheck_real_imag( 2025-12-04T14:00:07.9292293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9292413Z gradcheck_fn( 2025-12-04T14:00:07.9292830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9292927Z raise GradcheckError( 2025-12-04T14:00:07.9293280Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9293403Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9293489Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9293569Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9293637Z ..., 2025-12-04T14:00:07.9293718Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9293794Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9293935Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9294136Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9294260Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9294386Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9294457Z ..., 2025-12-04T14:00:07.9294582Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9294707Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9294837Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9294939Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9294944Z 2025-12-04T14:00:07.9294956Z 2025-12-04T14:00:07.9295137Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9295661Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9295666Z 2025-12-04T14:00:07.9295887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9296380Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.xml - 2025-12-04T14:00:07.9296520Z =========================== short test summary info ============================ 2025-12-04T14:00:07.9297226Z FAILED [0.3886s] test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9297391Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9297537Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9297619Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9297688Z ..., 2025-12-04T14:00:07.9297766Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9297846Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9297981Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9298179Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9298307Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9298431Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9298501Z ..., 2025-12-04T14:00:07.9298628Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9298751Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9298904Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9299096Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9299101Z 2025-12-04T14:00:07.9299109Z 2025-12-04T14:00:07.9299339Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9299864Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9299907Z 2025-12-04T14:00:07.9300131Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9300276Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:00:07.9300470Z ======= 1 failed, 1503 passed, 203 skipped, 2 rerun in 228.71s (0:03:48) ======= 2025-12-04T14:00:07.9300552Z Got exit code 1 2025-12-04T14:00:07.9300638Z Retrying single test... 2025-12-04T14:00:07.9300984Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.xml 2025-12-04T14:00:07.9301117Z ============================= test session starts ============================== 2025-12-04T14:00:07.9301409Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T14:00:07.9301499Z cachedir: .pytest_cache 2025-12-04T14:00:07.9301945Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:00:07.9302047Z rootdir: /var/lib/jenkins/workspace 2025-12-04T14:00:07.9302136Z configfile: pytest.ini 2025-12-04T14:00:07.9302595Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:00:07.9302788Z collecting ... collected 3100 items / 3099 deselected / 1 selected 2025-12-04T14:00:07.9303259Z stepcurrent: skipping 1706 already run items. Running only test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9303352Z Running 1 items in this shard 2025-12-04T14:00:07.9303356Z 2025-12-04T14:00:07.9303721Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.4337s] [100%] 2025-12-04T14:00:07.9304081Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.3819s] [100%] 2025-12-04T14:00:07.9304367Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 FAILED [0.3789s] [100%] 2025-12-04T14:00:07.9304374Z 2025-12-04T14:00:07.9304486Z ==================================== RERUNS ==================================== 2025-12-04T14:00:07.9304704Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9304804Z Traceback (most recent call last): 2025-12-04T14:00:07.9305136Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9305225Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9305525Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9305662Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9305903Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9306033Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9306482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9306643Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9307035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9307135Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9307568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9307655Z _gradcheck_real_imag( 2025-12-04T14:00:07.9308494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9308577Z gradcheck_fn( 2025-12-04T14:00:07.9309135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9313287Z raise GradcheckError( 2025-12-04T14:00:07.9313669Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9313955Z numerical:tensor([[0.6700, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9314092Z [0.0000, 0.5920, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9314216Z [0.0000, 0.0000, 0.1134, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9314294Z ..., 2025-12-04T14:00:07.9314418Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9314538Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9314670Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], 2025-12-04T14:00:07.9314785Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9314993Z analytical:tensor([[0.6700, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9315116Z [0.0000, 0.5920, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9315257Z [0.0000, 0.0000, 0.1134, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9315334Z ..., 2025-12-04T14:00:07.9315456Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9315580Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9315708Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], 2025-12-04T14:00:07.9315824Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9315829Z 2025-12-04T14:00:07.9315833Z 2025-12-04T14:00:07.9316019Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9316587Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9316592Z 2025-12-04T14:00:07.9316831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9317078Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9317187Z Traceback (most recent call last): 2025-12-04T14:00:07.9317489Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9317582Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9317839Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9317974Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9318279Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9318410Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9318964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9319126Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9319519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9319623Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9320062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9320151Z _gradcheck_real_imag( 2025-12-04T14:00:07.9320599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9320680Z gradcheck_fn( 2025-12-04T14:00:07.9321099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9321194Z raise GradcheckError( 2025-12-04T14:00:07.9321553Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9321672Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9321753Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9321926Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9321998Z ..., 2025-12-04T14:00:07.9322119Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9322198Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9322340Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9322539Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9322673Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9322799Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9322869Z ..., 2025-12-04T14:00:07.9322996Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9323116Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9323244Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9323353Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9323358Z 2025-12-04T14:00:07.9323362Z 2025-12-04T14:00:07.9323544Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9324069Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9324073Z 2025-12-04T14:00:07.9324295Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9324412Z =================================== FAILURES =================================== 2025-12-04T14:00:07.9324634Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9324732Z Traceback (most recent call last): 2025-12-04T14:00:07.9325010Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9325098Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9325349Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9325486Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9325722Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9325851Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9326296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9326502Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9326897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9327043Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9327477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9327568Z _gradcheck_real_imag( 2025-12-04T14:00:07.9328017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9328099Z gradcheck_fn( 2025-12-04T14:00:07.9328527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9328617Z raise GradcheckError( 2025-12-04T14:00:07.9328970Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9329089Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9329169Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9329248Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9329320Z ..., 2025-12-04T14:00:07.9329399Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9329478Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9329614Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9329867Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9330067Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9330188Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9330259Z ..., 2025-12-04T14:00:07.9330380Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9330503Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9330631Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9330733Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9330741Z 2025-12-04T14:00:07.9330745Z 2025-12-04T14:00:07.9330922Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9331450Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9331458Z 2025-12-04T14:00:07.9331680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9332175Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.xml - 2025-12-04T14:00:07.9332314Z =========================== short test summary info ============================ 2025-12-04T14:00:07.9333017Z FAILED [0.3789s] test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9333137Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9333220Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9333300Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9333370Z ..., 2025-12-04T14:00:07.9333445Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9333527Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9333669Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9333867Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9333994Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9334115Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9334185Z ..., 2025-12-04T14:00:07.9334356Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9334475Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9334644Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9334751Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9334756Z 2025-12-04T14:00:07.9334760Z 2025-12-04T14:00:07.9334940Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9335462Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9335470Z 2025-12-04T14:00:07.9335690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9335841Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:00:07.9336007Z ================= 1 failed, 3099 deselected, 2 rerun in 1.38s ================== 2025-12-04T14:00:07.9336085Z Got exit code 1 2025-12-04T14:00:07.9336175Z Retrying single test... 2025-12-04T14:00:07.9336519Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.xml 2025-12-04T14:00:07.9336657Z ============================= test session starts ============================== 2025-12-04T14:00:07.9336948Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T14:00:07.9337034Z cachedir: .pytest_cache 2025-12-04T14:00:07.9337531Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:00:07.9337670Z rootdir: /var/lib/jenkins/workspace 2025-12-04T14:00:07.9337755Z configfile: pytest.ini 2025-12-04T14:00:07.9338216Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:00:07.9338407Z collecting ... collected 3100 items / 3099 deselected / 1 selected 2025-12-04T14:00:07.9338903Z stepcurrent: skipping 1706 already run items. Running only test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9339075Z Running 1 items in this shard 2025-12-04T14:00:07.9339081Z 2025-12-04T14:00:07.9339445Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.4380s] [100%] 2025-12-04T14:00:07.9339810Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 ('RERUN', {'yellow': True}) [0.3848s] [100%] 2025-12-04T14:00:07.9340092Z test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 FAILED [0.3816s] [100%] 2025-12-04T14:00:07.9340100Z 2025-12-04T14:00:07.9340214Z ==================================== RERUNS ==================================== 2025-12-04T14:00:07.9340435Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9340533Z Traceback (most recent call last): 2025-12-04T14:00:07.9340814Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9340898Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9341153Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9341291Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9341524Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9341658Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9342110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9342268Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9342663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9342760Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9343245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9343332Z _gradcheck_real_imag( 2025-12-04T14:00:07.9343820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9343902Z gradcheck_fn( 2025-12-04T14:00:07.9344319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9344410Z raise GradcheckError( 2025-12-04T14:00:07.9344767Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9344957Z numerical:tensor([[0.6700, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9345078Z [0.0000, 0.5920, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9345193Z [0.0000, 0.0000, 0.1134, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9345266Z ..., 2025-12-04T14:00:07.9345382Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9345493Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9345613Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], 2025-12-04T14:00:07.9345722Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9345910Z analytical:tensor([[0.6700, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9346076Z [0.0000, 0.5920, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9346188Z [0.0000, 0.0000, 0.1134, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9346298Z ..., 2025-12-04T14:00:07.9346410Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9346519Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9346637Z [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], 2025-12-04T14:00:07.9346748Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9346752Z 2025-12-04T14:00:07.9346756Z 2025-12-04T14:00:07.9346930Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9347460Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9347465Z 2025-12-04T14:00:07.9347685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9347911Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9348011Z Traceback (most recent call last): 2025-12-04T14:00:07.9348290Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9348374Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9348629Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9348789Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9349048Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9349177Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9349625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9349781Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9350177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9350278Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9350711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9350798Z _gradcheck_real_imag( 2025-12-04T14:00:07.9351247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9351374Z gradcheck_fn( 2025-12-04T14:00:07.9351792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9351926Z raise GradcheckError( 2025-12-04T14:00:07.9352281Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9352401Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9352481Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9352561Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9352631Z ..., 2025-12-04T14:00:07.9352709Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9352792Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9352938Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9353139Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9353272Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9353400Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9353475Z ..., 2025-12-04T14:00:07.9353607Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9353733Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9353859Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9354017Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9354022Z 2025-12-04T14:00:07.9354063Z 2025-12-04T14:00:07.9354241Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9354774Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9354779Z 2025-12-04T14:00:07.9355008Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9355126Z =================================== FAILURES =================================== 2025-12-04T14:00:07.9355355Z ______________ TestSparseCUDA.test_sparse_mul_masked_cuda_float64 ______________ 2025-12-04T14:00:07.9355454Z Traceback (most recent call last): 2025-12-04T14:00:07.9355734Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9355820Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9356074Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9356212Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9356447Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9356578Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9357027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9357184Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9357588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9357685Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9358117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9358210Z _gradcheck_real_imag( 2025-12-04T14:00:07.9358663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9358745Z gradcheck_fn( 2025-12-04T14:00:07.9359173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9359262Z raise GradcheckError( 2025-12-04T14:00:07.9359628Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9359795Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9359879Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9360002Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9360074Z ..., 2025-12-04T14:00:07.9360154Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9360238Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9360375Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9360584Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9360720Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9360842Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9360914Z ..., 2025-12-04T14:00:07.9361037Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9361160Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9361290Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9361397Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9361402Z 2025-12-04T14:00:07.9361406Z 2025-12-04T14:00:07.9361585Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9362186Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9362191Z 2025-12-04T14:00:07.9362416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9362949Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.xml - 2025-12-04T14:00:07.9363090Z =========================== short test summary info ============================ 2025-12-04T14:00:07.9363792Z FAILED [0.3816s] test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9363913Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9363995Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9364077Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9364147Z ..., 2025-12-04T14:00:07.9364224Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9364307Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9364446Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9364652Z analytical:tensor([[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9364775Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9364898Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9364976Z ..., 2025-12-04T14:00:07.9365098Z [ 0.0000, 0.0000, 0.0000, ..., 1.0153, 0.0000, 0.0000], 2025-12-04T14:00:07.9365218Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.8483, 0.0000], 2025-12-04T14:00:07.9365354Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, -0.0211]], 2025-12-04T14:00:07.9365459Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9365463Z 2025-12-04T14:00:07.9365467Z 2025-12-04T14:00:07.9365648Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9366172Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9366179Z 2025-12-04T14:00:07.9366400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9366549Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:00:07.9366715Z ================= 1 failed, 3099 deselected, 2 rerun in 1.39s ================== 2025-12-04T14:00:07.9366841Z Got exit code 1 2025-12-04T14:00:07.9367152Z FAILED CONSISTENTLY: test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64 2025-12-04T14:00:07.9367545Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T14:00:07.9367888Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.xml 2025-12-04T14:00:07.9368027Z ============================= test session starts ============================== 2025-12-04T14:00:07.9368320Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T14:00:07.9368412Z cachedir: .pytest_cache 2025-12-04T14:00:07.9368884Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:00:07.9369010Z rootdir: /var/lib/jenkins/workspace 2025-12-04T14:00:07.9369103Z configfile: pytest.ini 2025-12-04T14:00:07.9369560Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:00:07.9369761Z collecting ... collected 3100 items / 1707 deselected / 1393 selected 2025-12-04T14:00:07.9369884Z stepcurrent: skipping 1707 already run items. 2025-12-04T14:00:07.9369977Z Running 1393 items in this shard 2025-12-04T14:00:07.9369986Z 2025-12-04T14:00:07.9370400Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.5012s] [ 0%] 2025-12-04T14:00:07.9370762Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4495s] [ 0%] 2025-12-04T14:00:07.9371087Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 FAILED [0.4447s] [ 0%] 2025-12-04T14:00:07.9371092Z 2025-12-04T14:00:07.9371203Z ==================================== RERUNS ==================================== 2025-12-04T14:00:07.9371426Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9371532Z Traceback (most recent call last): 2025-12-04T14:00:07.9371815Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9371909Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9372162Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9372297Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9372543Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9372676Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9373131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9373294Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9373686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9373796Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9374228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9374316Z _gradcheck_real_imag( 2025-12-04T14:00:07.9374766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9374846Z gradcheck_fn( 2025-12-04T14:00:07.9375268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9375366Z raise GradcheckError( 2025-12-04T14:00:07.9375722Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9375924Z numerical:tensor([[ 0.9997, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9376107Z [ 0.0000, -0.8658, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9376233Z [ 0.0000, 0.0000, -0.9013, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9376314Z ..., 2025-12-04T14:00:07.9376480Z [ 0.0000, 0.0000, 0.0000, ..., -0.5610, 0.0000, 0.0000], 2025-12-04T14:00:07.9376604Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.9928, 0.0000], 2025-12-04T14:00:07.9376732Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.2065]], 2025-12-04T14:00:07.9376838Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9377041Z analytical:tensor([[ 0.9997, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9377167Z [ 0.0000, -0.8658, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9377292Z [ 0.0000, 0.0000, -0.9013, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9377370Z ..., 2025-12-04T14:00:07.9377489Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9377618Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9377747Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], 2025-12-04T14:00:07.9377856Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9377860Z 2025-12-04T14:00:07.9377864Z 2025-12-04T14:00:07.9378045Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9378627Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9378634Z 2025-12-04T14:00:07.9378931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9379230Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9379330Z Traceback (most recent call last): 2025-12-04T14:00:07.9379610Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9379700Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9379951Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9380092Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9380325Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9380455Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9380909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9381066Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9381462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9381562Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9381993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9382086Z _gradcheck_real_imag( 2025-12-04T14:00:07.9382535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9382620Z gradcheck_fn( 2025-12-04T14:00:07.9383036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9383125Z raise GradcheckError( 2025-12-04T14:00:07.9383484Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9383602Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9383684Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9383767Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9383839Z ..., 2025-12-04T14:00:07.9383920Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9384001Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9384189Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9384313Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9384431Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9384507Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9384585Z ..., 2025-12-04T14:00:07.9384664Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9384740Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9384879Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9384884Z 2025-12-04T14:00:07.9384890Z 2025-12-04T14:00:07.9385070Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9385595Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9385599Z 2025-12-04T14:00:07.9385822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9385937Z =================================== FAILURES =================================== 2025-12-04T14:00:07.9386165Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9386264Z Traceback (most recent call last): 2025-12-04T14:00:07.9386543Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9386631Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9386931Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9387069Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9387345Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9387480Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9387929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9388089Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9388485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9388584Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9389017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9389106Z _gradcheck_real_imag( 2025-12-04T14:00:07.9389553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9389632Z gradcheck_fn( 2025-12-04T14:00:07.9390052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9390144Z raise GradcheckError( 2025-12-04T14:00:07.9390504Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9390626Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9390708Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9390796Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9390867Z ..., 2025-12-04T14:00:07.9390944Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9391032Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9391168Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9391293Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9391375Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9391457Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9391532Z ..., 2025-12-04T14:00:07.9391609Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9391687Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9391825Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9391877Z 2025-12-04T14:00:07.9391881Z 2025-12-04T14:00:07.9392061Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9392657Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9392663Z 2025-12-04T14:00:07.9392887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9393375Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.xml - 2025-12-04T14:00:07.9393524Z =========================== short test summary info ============================ 2025-12-04T14:00:07.9394219Z FAILED [0.4447s] test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9394346Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9394422Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9394500Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9394574Z ..., 2025-12-04T14:00:07.9394654Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9394730Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9394868Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9394990Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9395116Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9395197Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9395307Z ..., 2025-12-04T14:00:07.9395389Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9395467Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9395601Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9395608Z 2025-12-04T14:00:07.9395612Z 2025-12-04T14:00:07.9395796Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9396319Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9396323Z 2025-12-04T14:00:07.9396555Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9396702Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:00:07.9396871Z ================= 1 failed, 1707 deselected, 2 rerun in 1.59s ================== 2025-12-04T14:00:07.9396962Z Got exit code 1 2025-12-04T14:00:07.9397049Z Retrying single test... 2025-12-04T14:00:07.9397390Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.xml 2025-12-04T14:00:07.9397526Z ============================= test session starts ============================== 2025-12-04T14:00:07.9397819Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T14:00:07.9397909Z cachedir: .pytest_cache 2025-12-04T14:00:07.9398355Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:00:07.9398459Z rootdir: /var/lib/jenkins/workspace 2025-12-04T14:00:07.9398555Z configfile: pytest.ini 2025-12-04T14:00:07.9399064Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:00:07.9399259Z collecting ... collected 3100 items / 3099 deselected / 1 selected 2025-12-04T14:00:07.9399731Z stepcurrent: skipping 1707 already run items. Running only test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9399823Z Running 1 items in this shard 2025-12-04T14:00:07.9399828Z 2025-12-04T14:00:07.9400194Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4911s] [100%] 2025-12-04T14:00:07.9400601Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4435s] [100%] 2025-12-04T14:00:07.9400930Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 FAILED [0.4391s] [100%] 2025-12-04T14:00:07.9400935Z 2025-12-04T14:00:07.9401052Z ==================================== RERUNS ==================================== 2025-12-04T14:00:07.9401274Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9401380Z Traceback (most recent call last): 2025-12-04T14:00:07.9401661Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9401750Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9402011Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9402146Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9402383Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9402515Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9402968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9403134Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9403527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9403683Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9404119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9404249Z _gradcheck_real_imag( 2025-12-04T14:00:07.9404698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9404785Z gradcheck_fn( 2025-12-04T14:00:07.9405203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9405295Z raise GradcheckError( 2025-12-04T14:00:07.9405651Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9405851Z numerical:tensor([[ 0.9997, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9405985Z [ 0.0000, -0.8658, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9406115Z [ 0.0000, 0.0000, -0.9013, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9406187Z ..., 2025-12-04T14:00:07.9406316Z [ 0.0000, 0.0000, 0.0000, ..., -0.5610, 0.0000, 0.0000], 2025-12-04T14:00:07.9406435Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.9928, 0.0000], 2025-12-04T14:00:07.9406563Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.2065]], 2025-12-04T14:00:07.9406673Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9406870Z analytical:tensor([[ 0.9997, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9406997Z [ 0.0000, -0.8658, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9407122Z [ 0.0000, 0.0000, -0.9013, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9407197Z ..., 2025-12-04T14:00:07.9407320Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9407439Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9407572Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], 2025-12-04T14:00:07.9407681Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9407686Z 2025-12-04T14:00:07.9407690Z 2025-12-04T14:00:07.9408044Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9408597Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9408775Z 2025-12-04T14:00:07.9408997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9409281Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9409384Z Traceback (most recent call last): 2025-12-04T14:00:07.9409661Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9409750Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9410003Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9410141Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9410381Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9410509Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9410958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9411118Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9411516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9411621Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9412054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9412147Z _gradcheck_real_imag( 2025-12-04T14:00:07.9412664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9412803Z gradcheck_fn( 2025-12-04T14:00:07.9413225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9413313Z raise GradcheckError( 2025-12-04T14:00:07.9413669Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9413793Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9413874Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9413958Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9414034Z ..., 2025-12-04T14:00:07.9414111Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9414192Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9414330Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9414454Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9414536Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9414615Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9414688Z ..., 2025-12-04T14:00:07.9414774Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9414849Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9414986Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9414995Z 2025-12-04T14:00:07.9414999Z 2025-12-04T14:00:07.9415181Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9415708Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9415713Z 2025-12-04T14:00:07.9415940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9416057Z =================================== FAILURES =================================== 2025-12-04T14:00:07.9416286Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9416385Z Traceback (most recent call last): 2025-12-04T14:00:07.9416662Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9416757Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9417058Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9417192Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9417468Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9417600Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9418050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9418211Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9418603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9418708Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9419189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9419278Z _gradcheck_real_imag( 2025-12-04T14:00:07.9419731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9419811Z gradcheck_fn( 2025-12-04T14:00:07.9420234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9420324Z raise GradcheckError( 2025-12-04T14:00:07.9420678Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9420852Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9420933Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9421052Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9421129Z ..., 2025-12-04T14:00:07.9421208Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9421283Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9421428Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9421554Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9421635Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9421716Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9421790Z ..., 2025-12-04T14:00:07.9421872Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9421949Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9422081Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9422085Z 2025-12-04T14:00:07.9422089Z 2025-12-04T14:00:07.9422275Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9422803Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9422808Z 2025-12-04T14:00:07.9423034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9423522Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.xml - 2025-12-04T14:00:07.9423664Z =========================== short test summary info ============================ 2025-12-04T14:00:07.9424370Z FAILED [0.4391s] test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9424487Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9424569Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9424650Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9424725Z ..., 2025-12-04T14:00:07.9424805Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9424888Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9425022Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9425146Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9425298Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9425381Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9425454Z ..., 2025-12-04T14:00:07.9425531Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9425652Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9425785Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9425789Z 2025-12-04T14:00:07.9425793Z 2025-12-04T14:00:07.9432145Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9432721Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9432731Z 2025-12-04T14:00:07.9432969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9433125Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:00:07.9433299Z ================= 1 failed, 3099 deselected, 2 rerun in 1.56s ================== 2025-12-04T14:00:07.9433383Z Got exit code 1 2025-12-04T14:00:07.9433475Z Retrying single test... 2025-12-04T14:00:07.9433826Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.xml 2025-12-04T14:00:07.9433963Z ============================= test session starts ============================== 2025-12-04T14:00:07.9434262Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T14:00:07.9434434Z cachedir: .pytest_cache 2025-12-04T14:00:07.9434887Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:00:07.9435036Z rootdir: /var/lib/jenkins/workspace 2025-12-04T14:00:07.9435126Z configfile: pytest.ini 2025-12-04T14:00:07.9435593Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:00:07.9435789Z collecting ... collected 3100 items / 3099 deselected / 1 selected 2025-12-04T14:00:07.9436267Z stepcurrent: skipping 1707 already run items. Running only test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9436364Z Running 1 items in this shard 2025-12-04T14:00:07.9436369Z 2025-12-04T14:00:07.9436734Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4995s] [100%] 2025-12-04T14:00:07.9437106Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 ('RERUN', {'yellow': True}) [0.4364s] [100%] 2025-12-04T14:00:07.9437398Z test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 FAILED [0.4332s] [100%] 2025-12-04T14:00:07.9437403Z 2025-12-04T14:00:07.9437526Z ==================================== RERUNS ==================================== 2025-12-04T14:00:07.9437755Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9437861Z Traceback (most recent call last): 2025-12-04T14:00:07.9438153Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9438248Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9438510Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9438652Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9438892Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9439038Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9439495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9439663Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9440069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9440223Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9440663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9440804Z _gradcheck_real_imag( 2025-12-04T14:00:07.9441257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9441344Z gradcheck_fn( 2025-12-04T14:00:07.9441769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9441865Z raise GradcheckError( 2025-12-04T14:00:07.9442236Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, 2025-12-04T14:00:07.9442445Z numerical:tensor([[ 0.9997, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9442583Z [ 0.0000, -0.8658, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9442722Z [ 0.0000, 0.0000, -0.9013, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9442798Z ..., 2025-12-04T14:00:07.9442933Z [ 0.0000, 0.0000, 0.0000, ..., -0.5610, 0.0000, 0.0000], 2025-12-04T14:00:07.9443065Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.9928, 0.0000], 2025-12-04T14:00:07.9443196Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.2065]], 2025-12-04T14:00:07.9443313Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9443563Z analytical:tensor([[ 0.9997, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9443696Z [ 0.0000, -0.8658, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9443874Z [ 0.0000, 0.0000, -0.9013, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9443951Z ..., 2025-12-04T14:00:07.9444081Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9444208Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], 2025-12-04T14:00:07.9444342Z [ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], 2025-12-04T14:00:07.9444459Z device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9444463Z 2025-12-04T14:00:07.9444470Z 2025-12-04T14:00:07.9444656Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9445196Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9445202Z 2025-12-04T14:00:07.9445433Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9445666Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9445779Z Traceback (most recent call last): 2025-12-04T14:00:07.9446059Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9446151Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9446418Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9446559Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9446807Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9446945Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9447395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9447563Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9448054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9448215Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9448792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9448992Z _gradcheck_real_imag( 2025-12-04T14:00:07.9452794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9452897Z gradcheck_fn( 2025-12-04T14:00:07.9453392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9453497Z raise GradcheckError( 2025-12-04T14:00:07.9453857Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9453988Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9454077Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9454165Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9454246Z ..., 2025-12-04T14:00:07.9454329Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9454410Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9454559Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9454689Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9454769Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9454855Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9454932Z ..., 2025-12-04T14:00:07.9455015Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9455100Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9455241Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9455247Z 2025-12-04T14:00:07.9455304Z 2025-12-04T14:00:07.9455493Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9456069Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9456075Z 2025-12-04T14:00:07.9456301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9456426Z =================================== FAILURES =================================== 2025-12-04T14:00:07.9456651Z ______________ TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 ______________ 2025-12-04T14:00:07.9456762Z Traceback (most recent call last): 2025-12-04T14:00:07.9457045Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1785, in test_sparse_mul 2025-12-04T14:00:07.9477845Z test_shape(2, 3, [2, 3, 4, 5]) 2025-12-04T14:00:07.9478122Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 1781, in test_shape 2025-12-04T14:00:07.9478264Z gradcheck(lambda x, y: (x * y).to_dense(), [a, b]) 2025-12-04T14:00:07.9478529Z File "/var/lib/jenkins/workspace/test/test_sparse.py", line 101, in wrapped 2025-12-04T14:00:07.9478693Z return gradcheck_fn(fn, inputs, *args, **kwargs) 2025-12-04T14:00:07.9479146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 5182, in gradcheck 2025-12-04T14:00:07.9479313Z return torch.autograd.gradcheck(fn, inputs, **kwargs) 2025-12-04T14:00:07.9479710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2086, in gradcheck 2025-12-04T14:00:07.9479816Z return _gradcheck_helper(**args) 2025-12-04T14:00:07.9480254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2115, in _gradcheck_helper 2025-12-04T14:00:07.9480347Z _gradcheck_real_imag( 2025-12-04T14:00:07.9480801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1518, in _gradcheck_real_imag 2025-12-04T14:00:07.9480885Z gradcheck_fn( 2025-12-04T14:00:07.9481307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1659, in _slow_gradcheck 2025-12-04T14:00:07.9481402Z raise GradcheckError( 2025-12-04T14:00:07.9481761Z torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9481951Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9482040Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9482125Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9482201Z ..., 2025-12-04T14:00:07.9482326Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9482409Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9482555Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9482681Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9482763Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9482849Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9482927Z ..., 2025-12-04T14:00:07.9483009Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9483092Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9483229Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9483237Z 2025-12-04T14:00:07.9483241Z 2025-12-04T14:00:07.9483422Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9483946Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9483951Z 2025-12-04T14:00:07.9484173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9484667Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.xml - 2025-12-04T14:00:07.9484873Z =========================== short test summary info ============================ 2025-12-04T14:00:07.9485651Z FAILED [0.4332s] test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 1, 2025-12-04T14:00:07.9485772Z numerical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9485854Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9485935Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9486003Z ..., 2025-12-04T14:00:07.9486079Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9486162Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9486296Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9486420Z analytical:tensor([[0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9486497Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9486577Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9486647Z ..., 2025-12-04T14:00:07.9486728Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9486805Z [0., 0., 0., ..., 0., 0., 0.], 2025-12-04T14:00:07.9486942Z [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0', dtype=torch.float64) 2025-12-04T14:00:07.9486947Z 2025-12-04T14:00:07.9486951Z 2025-12-04T14:00:07.9487132Z To execute this test, run the following from the base repo dir: 2025-12-04T14:00:07.9487666Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse.py TestSparseCUDA.test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9487671Z 2025-12-04T14:00:07.9487895Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:00:07.9488056Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:00:07.9488223Z ================= 1 failed, 3099 deselected, 2 rerun in 1.56s ================== 2025-12-04T14:00:07.9488306Z Got exit code 1 2025-12-04T14:00:07.9488628Z FAILED CONSISTENTLY: test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64 2025-12-04T14:00:07.9488983Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T14:00:07.9489339Z Test results will be stored in test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.xml 2025-12-04T14:00:07.9489522Z ============================= test session starts ============================== 2025-12-04T14:00:07.9489815Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T14:00:07.9489953Z cachedir: .pytest_cache 2025-12-04T14:00:07.9490400Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:00:07.9490502Z rootdir: /var/lib/jenkins/workspace 2025-12-04T14:00:07.9490595Z configfile: pytest.ini 2025-12-04T14:00:07.9491061Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:00:07.9491271Z collecting ... collected 3100 items / 1708 deselected / 1392 selected 2025-12-04T14:00:07.9491397Z stepcurrent: skipping 1708 already run items. 2025-12-04T14:00:07.9491494Z Running 1392 items in this shard 2025-12-04T14:00:07.9491499Z 2025-12-04T14:00:07.9492122Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_bfloat16 SKIPPED [0.1640s] (Test with dtype=torch.bfloat16, device=cuda:0 runs only with coalesced inputs) [ 0%] 2025-12-04T14:00:07.9492437Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex128 PASSED [0.1365s] [ 0%] 2025-12-04T14:00:07.9492740Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex64 PASSED [0.0583s] [ 0%] 2025-12-04T14:00:07.9493381Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float16 SKIPPED [0.0237s] (Test with dtype=torch.float16, device=cuda:0 runs only with coalesced inputs) [ 0%] 2025-12-04T14:00:07.9493672Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float32 PASSED [0.0545s] [ 0%] 2025-12-04T14:00:07.9494005Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float64 PASSED [0.0546s] [ 0%] 2025-12-04T14:00:07.9494277Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int16 PASSED [0.0518s] [ 0%] 2025-12-04T14:00:07.9494550Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int32 PASSED [0.0472s] [ 0%] 2025-12-04T14:00:07.9494832Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int64 PASSED [0.0471s] [ 0%] 2025-12-04T14:00:07.9495108Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int8 PASSED [0.0470s] [ 0%] 2025-12-04T14:00:07.9495389Z test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_uint8 PASSED [0.0468s] [ 0%] 2025-12-04T14:00:07.9495731Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_bool SKIPPED [0.0013s] (Only runs on cpu) [ 0%] 2025-12-04T14:00:07.9496104Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [ 0%] 2025-12-04T14:00:07.9496472Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex64 SKIPPED [0.0014s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9496823Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9497180Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9497523Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int16 SKIPPED [0.0012s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9497867Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int32 SKIPPED [0.0012s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9498217Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int64 SKIPPED [0.0014s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9498561Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int8 SKIPPED [0.0012s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9498961Z test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_uint8 SKIPPED [0.0012s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9499270Z test_sparse.py::TestSparseCUDA::test_sparse_sum_cuda_float64 PASSED [1.5036s] [ 1%] 2025-12-04T14:00:07.9499590Z test_sparse.py::TestSparseCUDA::test_sparse_to_numpy_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9499989Z test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_complex128 SKIPPED [0.0012s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9500317Z test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 1%] 2025-12-04T14:00:07.9500626Z test_sparse.py::TestSparseCUDA::test_storage_not_null_cuda PASSED [0.0017s] [ 1%] 2025-12-04T14:00:07.9500867Z test_sparse.py::TestSparseCUDA::test_sum_cuda_bool PASSED [0.0151s] [ 1%] 2025-12-04T14:00:07.9501110Z test_sparse.py::TestSparseCUDA::test_sum_cuda_complex128 PASSED [0.0253s] [ 2%] 2025-12-04T14:00:07.9501363Z test_sparse.py::TestSparseCUDA::test_sum_cuda_complex64 PASSED [0.0248s] [ 2%] 2025-12-04T14:00:07.9501607Z test_sparse.py::TestSparseCUDA::test_sum_cuda_float32 PASSED [0.0326s] [ 2%] 2025-12-04T14:00:07.9501847Z test_sparse.py::TestSparseCUDA::test_sum_cuda_float64 PASSED [0.0226s] [ 2%] 2025-12-04T14:00:07.9502086Z test_sparse.py::TestSparseCUDA::test_sum_cuda_int16 PASSED [0.0145s] [ 2%] 2025-12-04T14:00:07.9502322Z test_sparse.py::TestSparseCUDA::test_sum_cuda_int32 PASSED [0.0143s] [ 2%] 2025-12-04T14:00:07.9502561Z test_sparse.py::TestSparseCUDA::test_sum_cuda_int64 PASSED [0.0141s] [ 2%] 2025-12-04T14:00:07.9502798Z test_sparse.py::TestSparseCUDA::test_sum_cuda_int8 PASSED [0.0142s] [ 2%] 2025-12-04T14:00:07.9503030Z test_sparse.py::TestSparseCUDA::test_sum_cuda_uint8 PASSED [0.0143s] [ 2%] 2025-12-04T14:00:07.9503293Z test_sparse.py::TestSparseCUDA::test_t_empty_cuda_complex128 PASSED [0.0025s] [ 2%] 2025-12-04T14:00:07.9503579Z test_sparse.py::TestSparseCUDA::test_t_empty_cuda_float64 PASSED [0.0020s] [ 2%] 2025-12-04T14:00:07.9503937Z test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_complex128 PASSED [0.0836s] [ 2%] 2025-12-04T14:00:07.9504244Z test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_float64 PASSED [0.0388s] [ 2%] 2025-12-04T14:00:07.9504560Z test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_complex128 PASSED [0.0999s] [ 2%] 2025-12-04T14:00:07.9504867Z test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_float64 PASSED [0.0373s] [ 3%] 2025-12-04T14:00:07.9505209Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_bfloat16 PASSED [0.0888s] [ 3%] 2025-12-04T14:00:07.9505558Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex128 PASSED [0.0886s] [ 3%] 2025-12-04T14:00:07.9505908Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex64 PASSED [0.0885s] [ 3%] 2025-12-04T14:00:07.9506244Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float16 PASSED [0.0876s] [ 3%] 2025-12-04T14:00:07.9506583Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float32 PASSED [0.0883s] [ 3%] 2025-12-04T14:00:07.9506911Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float64 PASSED [0.1495s] [ 3%] 2025-12-04T14:00:07.9507246Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_bfloat16 PASSED [0.0787s] [ 3%] 2025-12-04T14:00:07.9507603Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex128 PASSED [0.0787s] [ 3%] 2025-12-04T14:00:07.9508228Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex64 PASSED [0.0784s] [ 3%] 2025-12-04T14:00:07.9508671Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float16 PASSED [0.0784s] [ 3%] 2025-12-04T14:00:07.9509011Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float32 PASSED [0.0785s] [ 3%] 2025-12-04T14:00:07.9509339Z test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float64 PASSED [0.1393s] [ 3%] 2025-12-04T14:00:07.9509605Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_bfloat16 PASSED [0.0856s] [ 3%] 2025-12-04T14:00:07.9509869Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex128 PASSED [0.0666s] [ 4%] 2025-12-04T14:00:07.9510136Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex64 PASSED [0.0669s] [ 4%] 2025-12-04T14:00:07.9510511Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float16 PASSED [0.0655s] [ 4%] 2025-12-04T14:00:07.9510819Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float64 PASSED [0.0648s] [ 4%] 2025-12-04T14:00:07.9511071Z test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_int32 PASSED [0.0528s] [ 4%] 2025-12-04T14:00:07.9511336Z test_sparse.py::TestSparseCUDA::test_transpose_cuda_complex128 PASSED [0.0381s] [ 4%] 2025-12-04T14:00:07.9511589Z test_sparse.py::TestSparseCUDA::test_transpose_cuda_float64 PASSED [0.0366s] [ 4%] 2025-12-04T14:00:07.9511860Z test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_complex128 PASSED [0.0258s] [ 4%] 2025-12-04T14:00:07.9512115Z test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_float64 PASSED [0.0245s] [ 4%] 2025-12-04T14:00:07.9512368Z test_sparse.py::TestSparseCUDA::test_zeros_cuda_complex128 PASSED [0.2423s] [ 4%] 2025-12-04T14:00:07.9512605Z test_sparse.py::TestSparseCUDA::test_zeros_cuda_float64 PASSED [0.2367s] [ 4%] 2025-12-04T14:00:07.9512874Z test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_complex128 PASSED [0.2489s] [ 4%] 2025-12-04T14:00:07.9513132Z test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_float64 PASSED [0.2487s] [ 4%] 2025-12-04T14:00:07.9513505Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_fast_cuda PASSED [0.5590s] [ 4%] 2025-12-04T14:00:07.9513886Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_slow_cuda PASSED [27.8082s] [ 5%] 2025-12-04T14:00:07.9514333Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_fast_cuda PASSED [0.5185s] [ 5%] 2025-12-04T14:00:07.9514777Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_slow_cuda PASSED [25.3798s] [ 5%] 2025-12-04T14:00:07.9515152Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_fast_cuda PASSED [0.4612s] [ 5%] 2025-12-04T14:00:07.9515530Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_slow_cuda PASSED [25.8149s] [ 5%] 2025-12-04T14:00:07.9515918Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_fast_cuda PASSED [0.4917s] [ 5%] 2025-12-04T14:00:07.9516309Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_slow_cuda PASSED [23.9239s] [ 5%] 2025-12-04T14:00:07.9516681Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_fast_cuda PASSED [0.8813s] [ 5%] 2025-12-04T14:00:07.9517061Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_slow_cuda PASSED [24.3449s] [ 5%] 2025-12-04T14:00:07.9517446Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_fast_cuda PASSED [1.0267s] [ 5%] 2025-12-04T14:00:07.9517843Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_slow_cuda PASSED [27.1667s] [ 5%] 2025-12-04T14:00:07.9518212Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_fast_cuda PASSED [0.4711s] [ 5%] 2025-12-04T14:00:07.9518611Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_slow_cuda PASSED [20.6427s] [ 5%] 2025-12-04T14:00:07.9519031Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_fast_cuda PASSED [0.4668s] [ 5%] 2025-12-04T14:00:07.9519423Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_slow_cuda PASSED [23.6739s] [ 6%] 2025-12-04T14:00:07.9519796Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_fast_cuda PASSED [0.4209s] [ 6%] 2025-12-04T14:00:07.9520174Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_slow_cuda PASSED [16.6649s] [ 6%] 2025-12-04T14:00:07.9520556Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_fast_cuda PASSED [0.4370s] [ 6%] 2025-12-04T14:00:07.9520952Z test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_slow_cuda PASSED [20.3160s] [ 6%] 2025-12-04T14:00:07.9521360Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bfloat16 PASSED [0.0948s] [ 6%] 2025-12-04T14:00:07.9521749Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bool PASSED [0.0154s] [ 6%] 2025-12-04T14:00:07.9522133Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex128 PASSED [0.0180s] [ 6%] 2025-12-04T14:00:07.9522503Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex32 PASSED [0.8402s] [ 6%] 2025-12-04T14:00:07.9522875Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex64 PASSED [0.0165s] [ 6%] 2025-12-04T14:00:07.9523240Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float16 PASSED [0.0155s] [ 6%] 2025-12-04T14:00:07.9523601Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float32 PASSED [0.0149s] [ 6%] 2025-12-04T14:00:07.9523965Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float64 PASSED [0.0160s] [ 6%] 2025-12-04T14:00:07.9524315Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int16 PASSED [0.0151s] [ 6%] 2025-12-04T14:00:07.9524670Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int32 PASSED [0.0147s] [ 7%] 2025-12-04T14:00:07.9525015Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int64 PASSED [0.0152s] [ 7%] 2025-12-04T14:00:07.9525432Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int8 PASSED [0.0155s] [ 7%] 2025-12-04T14:00:07.9525825Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_uint8 PASSED [0.0150s] [ 7%] 2025-12-04T14:00:07.9526186Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bfloat16 PASSED [0.0140s] [ 7%] 2025-12-04T14:00:07.9526536Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bool PASSED [0.0144s] [ 7%] 2025-12-04T14:00:07.9526913Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex128 PASSED [0.0165s] [ 7%] 2025-12-04T14:00:07.9527283Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex32 PASSED [0.0142s] [ 7%] 2025-12-04T14:00:07.9527656Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex64 PASSED [0.0149s] [ 7%] 2025-12-04T14:00:07.9528019Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float16 PASSED [0.0145s] [ 7%] 2025-12-04T14:00:07.9528383Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float32 PASSED [0.0139s] [ 7%] 2025-12-04T14:00:07.9528785Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float64 PASSED [0.0150s] [ 7%] 2025-12-04T14:00:07.9529143Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int16 PASSED [0.0139s] [ 7%] 2025-12-04T14:00:07.9529497Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int32 PASSED [0.0136s] [ 7%] 2025-12-04T14:00:07.9529846Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int64 PASSED [0.0137s] [ 8%] 2025-12-04T14:00:07.9530195Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int8 PASSED [0.0141s] [ 8%] 2025-12-04T14:00:07.9530543Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_uint8 PASSED [0.0138s] [ 8%] 2025-12-04T14:00:07.9530910Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bfloat16 PASSED [0.0168s] [ 8%] 2025-12-04T14:00:07.9531262Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bool PASSED [0.0147s] [ 8%] 2025-12-04T14:00:07.9531633Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex128 PASSED [0.0218s] [ 8%] 2025-12-04T14:00:07.9532003Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex32 PASSED [0.2239s] [ 8%] 2025-12-04T14:00:07.9532417Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex64 PASSED [0.0190s] [ 8%] 2025-12-04T14:00:07.9532815Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float16 PASSED [0.0164s] [ 8%] 2025-12-04T14:00:07.9533180Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float32 PASSED [0.0164s] [ 8%] 2025-12-04T14:00:07.9533538Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float64 PASSED [0.0179s] [ 8%] 2025-12-04T14:00:07.9533887Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int16 PASSED [0.0146s] [ 8%] 2025-12-04T14:00:07.9534242Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int32 PASSED [0.0148s] [ 8%] 2025-12-04T14:00:07.9534775Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int64 PASSED [0.0153s] [ 8%] 2025-12-04T14:00:07.9535225Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int8 PASSED [0.0145s] [ 9%] 2025-12-04T14:00:07.9535574Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_uint8 PASSED [0.0146s] [ 9%] 2025-12-04T14:00:07.9535938Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bfloat16 PASSED [0.0129s] [ 9%] 2025-12-04T14:00:07.9536286Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bool PASSED [0.0126s] [ 9%] 2025-12-04T14:00:07.9536714Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex128 PASSED [0.0143s] [ 9%] 2025-12-04T14:00:07.9537127Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex32 PASSED [0.0137s] [ 9%] 2025-12-04T14:00:07.9537490Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex64 PASSED [0.0133s] [ 9%] 2025-12-04T14:00:07.9537849Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float16 PASSED [0.0127s] [ 9%] 2025-12-04T14:00:07.9538214Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float32 PASSED [0.0128s] [ 9%] 2025-12-04T14:00:07.9538572Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float64 PASSED [0.0136s] [ 9%] 2025-12-04T14:00:07.9538925Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int16 PASSED [0.0124s] [ 9%] 2025-12-04T14:00:07.9539353Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int32 PASSED [0.0124s] [ 9%] 2025-12-04T14:00:07.9539702Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int64 PASSED [0.0126s] [ 9%] 2025-12-04T14:00:07.9540055Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int8 PASSED [0.0129s] [ 9%] 2025-12-04T14:00:07.9540404Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_uint8 PASSED [0.0124s] [ 10%] 2025-12-04T14:00:07.9540772Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bfloat16 PASSED [0.0222s] [ 10%] 2025-12-04T14:00:07.9541119Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bool PASSED [0.0216s] [ 10%] 2025-12-04T14:00:07.9541493Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex128 PASSED [0.0257s] [ 10%] 2025-12-04T14:00:07.9541864Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex32 PASSED [0.0241s] [ 10%] 2025-12-04T14:00:07.9542236Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex64 PASSED [0.0236s] [ 10%] 2025-12-04T14:00:07.9542601Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float16 PASSED [0.0222s] [ 10%] 2025-12-04T14:00:07.9542960Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float32 PASSED [0.0230s] [ 10%] 2025-12-04T14:00:07.9543316Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float64 PASSED [0.0235s] [ 10%] 2025-12-04T14:00:07.9543722Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int16 PASSED [0.0216s] [ 10%] 2025-12-04T14:00:07.9544109Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int32 PASSED [0.0218s] [ 10%] 2025-12-04T14:00:07.9544465Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int64 PASSED [0.0225s] [ 10%] 2025-12-04T14:00:07.9544808Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int8 PASSED [0.0217s] [ 10%] 2025-12-04T14:00:07.9545159Z test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_uint8 PASSED [0.0216s] [ 10%] 2025-12-04T14:00:07.9545998Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSC_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0013s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9546807Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSR_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0012s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9547623Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCOO_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0016s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9548490Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSC_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0012s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9549365Z test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSR_cuda <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/sparse/__init__.py SKIPPED [0.0012s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9549946Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSC_cuda SKIPPED [0.0003s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 11%] 2025-12-04T14:00:07.9550525Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSR_cuda SKIPPED [0.0003s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 11%] 2025-12-04T14:00:07.9550872Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCOO_cuda PASSED [15.7366s] [ 11%] 2025-12-04T14:00:07.9551211Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSC_cuda PASSED [59.6674s] [ 11%] 2025-12-04T14:00:07.9551553Z test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSR_cuda PASSED [51.3308s] [ 11%] 2025-12-04T14:00:07.9552049Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSC_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9552539Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9553039Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCOO_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9553532Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 11%] 2025-12-04T14:00:07.9554024Z test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9554450Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9554875Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSR_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9555300Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCOO_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9555719Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9556189Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9556639Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_Strided_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9557078Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9557521Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSR_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9557963Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCOO_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9558402Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSC_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9558889Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9559319Z test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_Strided_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 12%] 2025-12-04T14:00:07.9559727Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSC_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%] 2025-12-04T14:00:07.9560133Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSR_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%] 2025-12-04T14:00:07.9560584Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCOO_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%] 2025-12-04T14:00:07.9561024Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSC_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%] 2025-12-04T14:00:07.9561425Z test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSR_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 13%] 2025-12-04T14:00:07.9561727Z test_sparse.py::TestSparseAnyCUDA::test_generate_simple_inputs_cuda PASSED [0.1417s] [ 13%] 2025-12-04T14:00:07.9562167Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_complex128 SKIPPED [0.0025s] (NOT IMPL) [ 13%] 2025-12-04T14:00:07.9562604Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_float64 SKIPPED [0.0018s] (NOT IMPL) [ 13%] 2025-12-04T14:00:07.9563041Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_complex128 SKIPPED [0.0018s] (NOT IMPL) [ 13%] 2025-12-04T14:00:07.9563464Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_float64 SKIPPED [0.0017s] (NOT IMPL) [ 13%] 2025-12-04T14:00:07.9563906Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_complex128 SKIPPED [0.0018s] (NOT IMPL) [ 13%] 2025-12-04T14:00:07.9564327Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_float64 SKIPPED [0.0017s] (NOT IMPL) [ 13%] 2025-12-04T14:00:07.9564773Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_complex128 SKIPPED [0.0022s] (NOT IMPL) [ 13%] 2025-12-04T14:00:07.9565199Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_float64 SKIPPED [0.0017s] (NOT IMPL) [ 13%] 2025-12-04T14:00:07.9565635Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_complex128 SKIPPED [0.0129s] (NOT IMPL) [ 14%] 2025-12-04T14:00:07.9566061Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_float64 SKIPPED [0.0039s] (NOT IMPL) [ 14%] 2025-12-04T14:00:07.9566498Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_complex128 SKIPPED [0.0234s] (NOT IMPL) [ 14%] 2025-12-04T14:00:07.9566923Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_float64 SKIPPED [0.0126s] (NOT IMPL) [ 14%] 2025-12-04T14:00:07.9567356Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_complex128 SKIPPED [0.0105s] (NOT IMPL) [ 14%] 2025-12-04T14:00:07.9567827Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_float64 SKIPPED [0.0059s] (NOT IMPL) [ 14%] 2025-12-04T14:00:07.9568334Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_complex128 SKIPPED [0.0277s] (NOT IMPL) [ 14%] 2025-12-04T14:00:07.9568807Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_float64 SKIPPED [0.0153s] (NOT IMPL) [ 14%] 2025-12-04T14:00:07.9569203Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_complex128 PASSED [0.0708s] [ 14%] 2025-12-04T14:00:07.9569586Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_float64 PASSED [0.0327s] [ 14%] 2025-12-04T14:00:07.9569978Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_complex128 PASSED [0.1016s] [ 14%] 2025-12-04T14:00:07.9570363Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_float64 PASSED [0.0398s] [ 14%] 2025-12-04T14:00:07.9570751Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_complex128 PASSED [0.0211s] [ 14%] 2025-12-04T14:00:07.9571139Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_float64 PASSED [0.0084s] [ 14%] 2025-12-04T14:00:07.9571526Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_complex128 PASSED [0.1082s] [ 15%] 2025-12-04T14:00:07.9571944Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_float64 PASSED [0.0373s] [ 15%] 2025-12-04T14:00:07.9572430Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_complex128 SKIPPED [0.0410s] (NOT IMPL) [ 15%] 2025-12-04T14:00:07.9572852Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_float64 SKIPPED [0.0053s] (NOT IMPL) [ 15%] 2025-12-04T14:00:07.9573295Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_complex128 SKIPPED [0.0441s] (NOT IMPL) [ 15%] 2025-12-04T14:00:07.9573726Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_float64 SKIPPED [0.0235s] (NOT IMPL) [ 15%] 2025-12-04T14:00:07.9574114Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_complex128 PASSED [0.0260s] [ 15%] 2025-12-04T14:00:07.9574492Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_float64 PASSED [0.0095s] [ 15%] 2025-12-04T14:00:07.9574884Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_complex128 PASSED [0.1576s] [ 15%] 2025-12-04T14:00:07.9575266Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_float64 PASSED [0.0444s] [ 15%] 2025-12-04T14:00:07.9575656Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_complex128 PASSED [0.0368s] [ 15%] 2025-12-04T14:00:07.9576030Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_float64 PASSED [0.0078s] [ 15%] 2025-12-04T14:00:07.9576422Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_complex128 PASSED [0.1094s] [ 15%] 2025-12-04T14:00:07.9576805Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_float64 PASSED [0.0450s] [ 15%] 2025-12-04T14:00:07.9577196Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_complex128 PASSED [0.0171s] [ 16%] 2025-12-04T14:00:07.9577572Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_float64 PASSED [0.0074s] [ 16%] 2025-12-04T14:00:07.9577961Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_complex128 PASSED [0.0943s] [ 16%] 2025-12-04T14:00:07.9578342Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_float64 PASSED [0.0339s] [ 16%] 2025-12-04T14:00:07.9578802Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_complex128 PASSED [21.2771s] [ 16%] 2025-12-04T14:00:07.9579321Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_float64 PASSED [8.8414s] [ 16%] 2025-12-04T14:00:07.9579794Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_complex128 PASSED [15.0273s] [ 16%] 2025-12-04T14:00:07.9580199Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_float64 PASSED [5.5979s] [ 16%] 2025-12-04T14:00:07.9580625Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_complex128 PASSED [20.1339s] [ 16%] 2025-12-04T14:00:07.9581033Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_float64 PASSED [8.0320s] [ 16%] 2025-12-04T14:00:07.9581456Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_complex128 PASSED [14.6100s] [ 16%] 2025-12-04T14:00:07.9581858Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_float64 PASSED [5.4254s] [ 16%] 2025-12-04T14:00:07.9582279Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_complex128 PASSED [13.3815s] [ 16%] 2025-12-04T14:00:07.9582690Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_float64 PASSED [5.6125s] [ 16%] 2025-12-04T14:00:07.9583110Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_complex128 PASSED [12.7898s] [ 17%] 2025-12-04T14:00:07.9583558Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_float64 PASSED [4.8634s] [ 17%] 2025-12-04T14:00:07.9584018Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_complex128 PASSED [15.1337s] [ 17%] 2025-12-04T14:00:07.9584420Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_float64 PASSED [6.5944s] [ 17%] 2025-12-04T14:00:07.9584849Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_complex128 PASSED [14.1451s] [ 17%] 2025-12-04T14:00:07.9585257Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_float64 PASSED [5.2235s] [ 17%] 2025-12-04T14:00:07.9585681Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_complex128 PASSED [12.0550s] [ 17%] 2025-12-04T14:00:07.9586082Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_float64 PASSED [4.9622s] [ 17%] 2025-12-04T14:00:07.9586505Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_complex128 PASSED [13.7667s] [ 17%] 2025-12-04T14:00:07.9586913Z test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_float64 PASSED [4.9865s] [ 17%] 2025-12-04T14:00:07.9587268Z test_sparse.py::TestSparseAnyCUDA::test_invalid_blocksize_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 17%] 2025-12-04T14:00:07.9587627Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_bfloat16 PASSED [0.0911s] [ 17%] 2025-12-04T14:00:07.9587994Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex128 PASSED [0.0867s] [ 17%] 2025-12-04T14:00:07.9588348Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex32 PASSED [0.0864s] [ 17%] 2025-12-04T14:00:07.9588714Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex64 PASSED [0.0873s] [ 18%] 2025-12-04T14:00:07.9589101Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float16 PASSED [0.0863s] [ 18%] 2025-12-04T14:00:07.9589449Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float32 PASSED [0.0860s] [ 18%] 2025-12-04T14:00:07.9589790Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float64 PASSED [0.0702s] [ 18%] 2025-12-04T14:00:07.9590137Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_bfloat16 PASSED [0.0859s] [ 18%] 2025-12-04T14:00:07.9590553Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex128 PASSED [0.0860s] [ 18%] 2025-12-04T14:00:07.9590948Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex32 PASSED [0.0873s] [ 18%] 2025-12-04T14:00:07.9591307Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex64 PASSED [0.0860s] [ 18%] 2025-12-04T14:00:07.9591650Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float16 PASSED [0.0850s] [ 18%] 2025-12-04T14:00:07.9591991Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float32 PASSED [0.0864s] [ 18%] 2025-12-04T14:00:07.9592343Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float64 PASSED [0.0689s] [ 18%] 2025-12-04T14:00:07.9592758Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_bfloat16 SKIPPED [0.0159s] (NO SAMPLES!) [ 18%] 2025-12-04T14:00:07.9593195Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex128 SKIPPED [0.0158s] (NO SAMPLES!) [ 18%] 2025-12-04T14:00:07.9593619Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex32 SKIPPED [0.0154s] (NO SAMPLES!) [ 18%] 2025-12-04T14:00:07.9594041Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex64 SKIPPED [0.0154s] (NO SAMPLES!) [ 19%] 2025-12-04T14:00:07.9594497Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float16 SKIPPED [0.0157s] (NO SAMPLES!) [ 19%] 2025-12-04T14:00:07.9594908Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float32 SKIPPED [0.0153s] (NO SAMPLES!) [ 19%] 2025-12-04T14:00:07.9595449Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float64 SKIPPED [0.0153s] (NO SAMPLES!) [ 19%] 2025-12-04T14:00:07.9595804Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_bfloat16 PASSED [0.0802s] [ 19%] 2025-12-04T14:00:07.9596165Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex128 PASSED [0.0792s] [ 19%] 2025-12-04T14:00:07.9596557Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex32 PASSED [0.0791s] [ 19%] 2025-12-04T14:00:07.9596915Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex64 PASSED [0.0799s] [ 19%] 2025-12-04T14:00:07.9597268Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float16 PASSED [0.0787s] [ 19%] 2025-12-04T14:00:07.9597616Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float32 PASSED [0.0791s] [ 19%] 2025-12-04T14:00:07.9597963Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float64 PASSED [0.0645s] [ 19%] 2025-12-04T14:00:07.9598322Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_bfloat16 PASSED [0.0795s] [ 19%] 2025-12-04T14:00:07.9598684Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex128 PASSED [0.0794s] [ 19%] 2025-12-04T14:00:07.9599047Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex32 PASSED [0.0801s] [ 19%] 2025-12-04T14:00:07.9599409Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex64 PASSED [0.0790s] [ 20%] 2025-12-04T14:00:07.9599755Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float16 PASSED [0.0793s] [ 20%] 2025-12-04T14:00:07.9600113Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float32 PASSED [0.0804s] [ 20%] 2025-12-04T14:00:07.9600463Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float64 PASSED [0.0636s] [ 20%] 2025-12-04T14:00:07.9600812Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bfloat16 PASSED [0.0630s] [ 20%] 2025-12-04T14:00:07.9601153Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bool PASSED [0.0642s] [ 20%] 2025-12-04T14:00:07.9601560Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex128 PASSED [0.0636s] [ 20%] 2025-12-04T14:00:07.9601965Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex32 PASSED [0.0633s] [ 20%] 2025-12-04T14:00:07.9602323Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex64 PASSED [0.0640s] [ 20%] 2025-12-04T14:00:07.9602665Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float16 PASSED [0.0631s] [ 20%] 2025-12-04T14:00:07.9603019Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float32 PASSED [0.0629s] [ 20%] 2025-12-04T14:00:07.9603364Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float64 PASSED [0.0568s] [ 20%] 2025-12-04T14:00:07.9603709Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int16 PASSED [0.0629s] [ 20%] 2025-12-04T14:00:07.9604051Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int32 PASSED [0.0618s] [ 20%] 2025-12-04T14:00:07.9604386Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int64 PASSED [0.0642s] [ 21%] 2025-12-04T14:00:07.9604737Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int8 PASSED [0.0632s] [ 21%] 2025-12-04T14:00:07.9605076Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_uint8 PASSED [0.0632s] [ 21%] 2025-12-04T14:00:07.9605478Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bfloat16 PASSED [0.0640s] [ 21%] 2025-12-04T14:00:07.9605850Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bool PASSED [0.0628s] [ 21%] 2025-12-04T14:00:07.9606211Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex128 PASSED [0.0628s] [ 21%] 2025-12-04T14:00:07.9606575Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex32 PASSED [0.0639s] [ 21%] 2025-12-04T14:00:07.9606932Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex64 PASSED [0.0627s] [ 21%] 2025-12-04T14:00:07.9607290Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float16 PASSED [0.0630s] [ 21%] 2025-12-04T14:00:07.9607637Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float32 PASSED [0.0641s] [ 21%] 2025-12-04T14:00:07.9608156Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float64 PASSED [0.0551s] [ 21%] 2025-12-04T14:00:07.9608516Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int16 PASSED [0.0628s] [ 21%] 2025-12-04T14:00:07.9608891Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int32 PASSED [0.0641s] [ 21%] 2025-12-04T14:00:07.9609239Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int64 PASSED [0.0624s] [ 21%] 2025-12-04T14:00:07.9609573Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int8 PASSED [0.0629s] [ 22%] 2025-12-04T14:00:07.9609907Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_uint8 PASSED [0.0640s] [ 22%] 2025-12-04T14:00:07.9610332Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bfloat16 SKIPPED [0.0156s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9610737Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bool SKIPPED [0.0154s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9611173Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex128 SKIPPED [0.0157s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9611594Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex32 SKIPPED [0.0154s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9612017Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex64 SKIPPED [0.0154s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9612541Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float16 SKIPPED [0.0157s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9613007Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float32 SKIPPED [0.0154s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9613424Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float64 SKIPPED [0.0153s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9613824Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int16 SKIPPED [0.0158s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9614227Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int32 SKIPPED [0.0153s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9614637Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int64 SKIPPED [0.0154s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9615034Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int8 SKIPPED [0.0157s] (NO SAMPLES!) [ 22%] 2025-12-04T14:00:07.9615441Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_uint8 SKIPPED [0.0153s] (NO SAMPLES!) [ 23%] 2025-12-04T14:00:07.9615794Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bfloat16 PASSED [0.0554s] [ 23%] 2025-12-04T14:00:07.9616125Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bool PASSED [0.0563s] [ 23%] 2025-12-04T14:00:07.9616488Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex128 PASSED [0.0553s] [ 23%] 2025-12-04T14:00:07.9616901Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex32 PASSED [0.0553s] [ 23%] 2025-12-04T14:00:07.9617325Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex64 PASSED [0.0558s] [ 23%] 2025-12-04T14:00:07.9617675Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float16 PASSED [0.0559s] [ 23%] 2025-12-04T14:00:07.9618019Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float32 PASSED [0.0559s] [ 23%] 2025-12-04T14:00:07.9618366Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float64 PASSED [0.0494s] [ 23%] 2025-12-04T14:00:07.9618702Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int16 PASSED [0.0554s] [ 23%] 2025-12-04T14:00:07.9619077Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int32 PASSED [0.0554s] [ 23%] 2025-12-04T14:00:07.9619420Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int64 PASSED [0.0564s] [ 23%] 2025-12-04T14:00:07.9619751Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int8 PASSED [0.0551s] [ 23%] 2025-12-04T14:00:07.9620096Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_uint8 PASSED [0.0552s] [ 23%] 2025-12-04T14:00:07.9620443Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bfloat16 PASSED [0.0564s] [ 24%] 2025-12-04T14:00:07.9620774Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bool PASSED [0.0553s] [ 24%] 2025-12-04T14:00:07.9621141Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex128 PASSED [0.0546s] [ 24%] 2025-12-04T14:00:07.9621496Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex32 PASSED [0.0564s] [ 24%] 2025-12-04T14:00:07.9621859Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex64 PASSED [0.0556s] [ 24%] 2025-12-04T14:00:07.9622201Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float16 PASSED [0.0553s] [ 24%] 2025-12-04T14:00:07.9622542Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float32 PASSED [0.0564s] [ 24%] 2025-12-04T14:00:07.9622890Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float64 PASSED [0.0489s] [ 24%] 2025-12-04T14:00:07.9623222Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int16 PASSED [0.0551s] [ 24%] 2025-12-04T14:00:07.9623636Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int32 PASSED [0.0565s] [ 24%] 2025-12-04T14:00:07.9624034Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int64 PASSED [0.0554s] [ 24%] 2025-12-04T14:00:07.9624391Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int8 PASSED [0.0552s] [ 24%] 2025-12-04T14:00:07.9624762Z test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_uint8 PASSED [0.0562s] [ 24%] 2025-12-04T14:00:07.9625185Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSC_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 25%] 2025-12-04T14:00:07.9625618Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 25%] 2025-12-04T14:00:07.9626039Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCOO_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 25%] 2025-12-04T14:00:07.9626468Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSC_cuda SKIPPED [0.0013s] (Only runs on cpu) [ 25%] 2025-12-04T14:00:07.9626894Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSR_cuda SKIPPED [0.0012s] (Only runs on cpu) [ 25%] 2025-12-04T14:00:07.9627304Z test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_Strided_cuda SKIPPED [0.0015s] (Only runs on cpu) [ 25%] 2025-12-04T14:00:07.9627730Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex128 PASSED [0.0115s] [ 25%] 2025-12-04T14:00:07.9628191Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex64 PASSED [0.0112s] [ 25%] 2025-12-04T14:00:07.9628606Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float32 PASSED [0.0109s] [ 25%] 2025-12-04T14:00:07.9628988Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float64 PASSED [0.0116s] [ 25%] 2025-12-04T14:00:07.9629380Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex128 PASSED [0.0106s] [ 25%] 2025-12-04T14:00:07.9629772Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex64 PASSED [0.0106s] [ 25%] 2025-12-04T14:00:07.9630144Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float32 PASSED [0.0103s] [ 25%] 2025-12-04T14:00:07.9630523Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float64 PASSED [0.0107s] [ 25%] 2025-12-04T14:00:07.9630921Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex128 PASSED [0.0440s] [ 26%] 2025-12-04T14:00:07.9631316Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex64 PASSED [0.0441s] [ 26%] 2025-12-04T14:00:07.9631702Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float32 PASSED [0.0417s] [ 26%] 2025-12-04T14:00:07.9632076Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float64 PASSED [0.0419s] [ 26%] 2025-12-04T14:00:07.9632473Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex128 PASSED [0.0106s] [ 26%] 2025-12-04T14:00:07.9632870Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex64 PASSED [0.0105s] [ 26%] 2025-12-04T14:00:07.9633245Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float32 PASSED [0.0103s] [ 26%] 2025-12-04T14:00:07.9633629Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float64 PASSED [0.0108s] [ 26%] 2025-12-04T14:00:07.9634017Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex128 PASSED [0.0257s] [ 26%] 2025-12-04T14:00:07.9634402Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex64 PASSED [0.0127s] [ 26%] 2025-12-04T14:00:07.9634788Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float32 PASSED [0.0122s] [ 26%] 2025-12-04T14:00:07.9635206Z test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float64 PASSED [0.0126s] [ 26%] 2025-12-04T14:00:07.9635592Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bfloat16 PASSED [0.0108s] [ 26%] 2025-12-04T14:00:07.9635934Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bool PASSED [0.0055s] [ 26%] 2025-12-04T14:00:07.9636281Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex128 PASSED [0.0108s] [ 27%] 2025-12-04T14:00:07.9636634Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex32 PASSED [0.7139s] [ 27%] 2025-12-04T14:00:07.9636980Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex64 PASSED [0.0110s] [ 27%] 2025-12-04T14:00:07.9637319Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float16 PASSED [0.0108s] [ 27%] 2025-12-04T14:00:07.9637663Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float32 PASSED [0.0106s] [ 27%] 2025-12-04T14:00:07.9637996Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float64 PASSED [0.0111s] [ 27%] 2025-12-04T14:00:07.9638340Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int16 PASSED [0.0104s] [ 27%] 2025-12-04T14:00:07.9638670Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int32 PASSED [0.0104s] [ 27%] 2025-12-04T14:00:07.9639045Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int64 PASSED [0.0102s] [ 27%] 2025-12-04T14:00:07.9639379Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int8 PASSED [0.0107s] [ 27%] 2025-12-04T14:00:07.9639749Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_uint8 PASSED [0.0055s] [ 27%] 2025-12-04T14:00:07.9640092Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bfloat16 PASSED [0.0103s] [ 27%] 2025-12-04T14:00:07.9640422Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bool PASSED [0.0053s] [ 27%] 2025-12-04T14:00:07.9640769Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex128 PASSED [0.0107s] [ 27%] 2025-12-04T14:00:07.9641119Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex32 PASSED [0.0103s] [ 28%] 2025-12-04T14:00:07.9641463Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex64 PASSED [0.0103s] [ 28%] 2025-12-04T14:00:07.9641803Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float16 PASSED [0.0102s] [ 28%] 2025-12-04T14:00:07.9642134Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float32 PASSED [0.0113s] [ 28%] 2025-12-04T14:00:07.9642470Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float64 PASSED [0.0106s] [ 28%] 2025-12-04T14:00:07.9642808Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int16 PASSED [0.0098s] [ 28%] 2025-12-04T14:00:07.9643136Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int32 PASSED [0.0098s] [ 28%] 2025-12-04T14:00:07.9643462Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int64 PASSED [0.0105s] [ 28%] 2025-12-04T14:00:07.9643797Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int8 PASSED [0.0098s] [ 28%] 2025-12-04T14:00:07.9644125Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_uint8 PASSED [0.0053s] [ 28%] 2025-12-04T14:00:07.9644470Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bfloat16 PASSED [0.0336s] [ 28%] 2025-12-04T14:00:07.9644793Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bool PASSED [0.0177s] [ 28%] 2025-12-04T14:00:07.9645147Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex128 PASSED [0.0348s] [ 28%] 2025-12-04T14:00:07.9645496Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex32 PASSED [1.1282s] [ 28%] 2025-12-04T14:00:07.9645881Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex64 PASSED [0.0343s] [ 29%] 2025-12-04T14:00:07.9646261Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float16 PASSED [0.0338s] [ 29%] 2025-12-04T14:00:07.9646595Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float32 PASSED [0.0329s] [ 29%] 2025-12-04T14:00:07.9646927Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float64 PASSED [0.0329s] [ 29%] 2025-12-04T14:00:07.9647262Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int16 PASSED [0.0168s] [ 29%] 2025-12-04T14:00:07.9647589Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int32 PASSED [0.0172s] [ 29%] 2025-12-04T14:00:07.9647918Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int64 PASSED [0.0159s] [ 29%] 2025-12-04T14:00:07.9648243Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int8 PASSED [0.0167s] [ 29%] 2025-12-04T14:00:07.9648572Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_uint8 PASSED [0.0167s] [ 29%] 2025-12-04T14:00:07.9648920Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bfloat16 PASSED [0.0106s] [ 29%] 2025-12-04T14:00:07.9649246Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bool PASSED [0.0052s] [ 29%] 2025-12-04T14:00:07.9649598Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex128 PASSED [0.0101s] [ 29%] 2025-12-04T14:00:07.9649982Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex32 PASSED [0.0102s] [ 29%] 2025-12-04T14:00:07.9650391Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex64 PASSED [0.0106s] [ 29%] 2025-12-04T14:00:07.9650726Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float16 PASSED [0.0100s] [ 30%] 2025-12-04T14:00:07.9651058Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float32 PASSED [0.0100s] [ 30%] 2025-12-04T14:00:07.9651391Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float64 PASSED [0.0100s] [ 30%] 2025-12-04T14:00:07.9651727Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int16 PASSED [0.0101s] [ 30%] 2025-12-04T14:00:07.9652052Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int32 PASSED [0.0097s] [ 30%] 2025-12-04T14:00:07.9652377Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int64 PASSED [0.0095s] [ 30%] 2025-12-04T14:00:07.9652702Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int8 PASSED [0.0097s] [ 30%] 2025-12-04T14:00:07.9653030Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_uint8 PASSED [0.0056s] [ 30%] 2025-12-04T14:00:07.9653373Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bfloat16 PASSED [0.0121s] [ 30%] 2025-12-04T14:00:07.9653697Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bool PASSED [0.0050s] [ 30%] 2025-12-04T14:00:07.9654053Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex128 PASSED [0.0120s] [ 30%] 2025-12-04T14:00:07.9654397Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex32 PASSED [0.0101s] [ 30%] 2025-12-04T14:00:07.9654739Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex64 PASSED [0.0121s] [ 30%] 2025-12-04T14:00:07.9655078Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float16 PASSED [0.0121s] [ 30%] 2025-12-04T14:00:07.9655410Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float32 PASSED [0.0118s] [ 31%] 2025-12-04T14:00:07.9655755Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float64 PASSED [0.0124s] [ 31%] 2025-12-04T14:00:07.9656079Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int16 PASSED [0.0113s] [ 31%] 2025-12-04T14:00:07.9656452Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int32 PASSED [0.0112s] [ 31%] 2025-12-04T14:00:07.9656788Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int64 PASSED [0.0110s] [ 31%] 2025-12-04T14:00:07.9657149Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int8 PASSED [0.0117s] [ 31%] 2025-12-04T14:00:07.9657482Z test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_uint8 PASSED [0.0071s] [ 31%] 2025-12-04T14:00:07.9657807Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bfloat16 PASSED [0.0224s] [ 31%] 2025-12-04T14:00:07.9658117Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bool PASSED [0.0208s] [ 31%] 2025-12-04T14:00:07.9658458Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex128 PASSED [0.0224s] [ 31%] 2025-12-04T14:00:07.9658785Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex64 PASSED [0.0230s] [ 31%] 2025-12-04T14:00:07.9659149Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float16 PASSED [0.0221s] [ 31%] 2025-12-04T14:00:07.9659473Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float32 PASSED [0.0222s] [ 31%] 2025-12-04T14:00:07.9659796Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float64 PASSED [0.0221s] [ 31%] 2025-12-04T14:00:07.9660111Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int16 PASSED [0.0209s] [ 32%] 2025-12-04T14:00:07.9660478Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int32 PASSED [0.0209s] [ 32%] 2025-12-04T14:00:07.9660788Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int64 PASSED [0.0213s] [ 32%] 2025-12-04T14:00:07.9661142Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int8 PASSED [0.0208s] [ 32%] 2025-12-04T14:00:07.9661451Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_uint8 PASSED [0.0209s] [ 32%] 2025-12-04T14:00:07.9661782Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bfloat16 PASSED [0.0218s] [ 32%] 2025-12-04T14:00:07.9662085Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bool PASSED [0.0206s] [ 32%] 2025-12-04T14:00:07.9662419Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex128 PASSED [0.0222s] [ 32%] 2025-12-04T14:00:07.9662759Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex64 PASSED [0.0226s] [ 32%] 2025-12-04T14:00:07.9663078Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float16 PASSED [0.0218s] [ 32%] 2025-12-04T14:00:07.9663399Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float32 PASSED [0.0218s] [ 32%] 2025-12-04T14:00:07.9663719Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float64 PASSED [0.0218s] [ 32%] 2025-12-04T14:00:07.9664032Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int16 PASSED [0.0207s] [ 32%] 2025-12-04T14:00:07.9664346Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int32 PASSED [0.0207s] [ 32%] 2025-12-04T14:00:07.9664656Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int64 PASSED [0.0212s] [ 33%] 2025-12-04T14:00:07.9664962Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int8 PASSED [0.0206s] [ 33%] 2025-12-04T14:00:07.9665278Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_uint8 PASSED [0.0206s] [ 33%] 2025-12-04T14:00:07.9665601Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bfloat16 PASSED [0.0141s] [ 33%] 2025-12-04T14:00:07.9665911Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bool PASSED [0.0125s] [ 33%] 2025-12-04T14:00:07.9666248Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex128 PASSED [0.0142s] [ 33%] 2025-12-04T14:00:07.9666579Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex64 PASSED [0.0147s] [ 33%] 2025-12-04T14:00:07.9666960Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float16 PASSED [0.0139s] [ 33%] 2025-12-04T14:00:07.9667281Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float32 PASSED [0.0138s] [ 33%] 2025-12-04T14:00:07.9667648Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float64 PASSED [0.0138s] [ 33%] 2025-12-04T14:00:07.9667957Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int16 PASSED [0.0126s] [ 33%] 2025-12-04T14:00:07.9668266Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int32 PASSED [0.0128s] [ 33%] 2025-12-04T14:00:07.9668602Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int64 PASSED [0.0131s] [ 33%] 2025-12-04T14:00:07.9668944Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int8 PASSED [0.0126s] [ 33%] 2025-12-04T14:00:07.9669264Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_uint8 PASSED [0.0126s] [ 34%] 2025-12-04T14:00:07.9669585Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bfloat16 PASSED [0.0215s] [ 34%] 2025-12-04T14:00:07.9669896Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bool PASSED [0.0202s] [ 34%] 2025-12-04T14:00:07.9670239Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex128 PASSED [0.0217s] [ 34%] 2025-12-04T14:00:07.9670566Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex64 PASSED [0.0222s] [ 34%] 2025-12-04T14:00:07.9670884Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float16 PASSED [0.0214s] [ 34%] 2025-12-04T14:00:07.9671255Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float32 PASSED [0.0214s] [ 34%] 2025-12-04T14:00:07.9671610Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float64 PASSED [0.0214s] [ 34%] 2025-12-04T14:00:07.9671923Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int16 PASSED [0.0202s] [ 34%] 2025-12-04T14:00:07.9672231Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int32 PASSED [0.0201s] [ 34%] 2025-12-04T14:00:07.9672541Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int64 PASSED [0.0207s] [ 34%] 2025-12-04T14:00:07.9672857Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int8 PASSED [0.0202s] [ 34%] 2025-12-04T14:00:07.9673165Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_uint8 PASSED [0.0202s] [ 34%] 2025-12-04T14:00:07.9673490Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bfloat16 PASSED [0.0210s] [ 34%] 2025-12-04T14:00:07.9673798Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bool PASSED [0.0198s] [ 35%] 2025-12-04T14:00:07.9674134Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex128 PASSED [0.0213s] [ 35%] 2025-12-04T14:00:07.9674466Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex64 PASSED [0.0218s] [ 35%] 2025-12-04T14:00:07.9674786Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float16 PASSED [0.0210s] [ 35%] 2025-12-04T14:00:07.9675112Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float32 PASSED [0.0209s] [ 35%] 2025-12-04T14:00:07.9675431Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float64 PASSED [0.0209s] [ 35%] 2025-12-04T14:00:07.9675742Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int16 PASSED [0.0197s] [ 35%] 2025-12-04T14:00:07.9676056Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int32 PASSED [0.0197s] [ 35%] 2025-12-04T14:00:07.9676363Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int64 PASSED [0.0202s] [ 35%] 2025-12-04T14:00:07.9676678Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int8 PASSED [0.0198s] [ 35%] 2025-12-04T14:00:07.9676985Z test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_uint8 PASSED [0.0197s] [ 35%] 2025-12-04T14:00:07.9677314Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bfloat16 PASSED [0.0562s] [ 35%] 2025-12-04T14:00:07.9677681Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bool PASSED [0.0468s] [ 35%] 2025-12-04T14:00:07.9678063Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex128 PASSED [0.0577s] [ 35%] 2025-12-04T14:00:07.9678402Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex64 PASSED [0.0581s] [ 36%] 2025-12-04T14:00:07.9678733Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float16 PASSED [0.0557s] [ 36%] 2025-12-04T14:00:07.9679058Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float32 PASSED [0.0558s] [ 36%] 2025-12-04T14:00:07.9679390Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float64 PASSED [0.0556s] [ 36%] 2025-12-04T14:00:07.9679709Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int16 PASSED [0.0469s] [ 36%] 2025-12-04T14:00:07.9680031Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int32 PASSED [0.0468s] [ 36%] 2025-12-04T14:00:07.9680360Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int64 PASSED [0.0473s] [ 36%] 2025-12-04T14:00:07.9680677Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int8 PASSED [0.0470s] [ 36%] 2025-12-04T14:00:07.9680999Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_uint8 PASSED [0.0468s] [ 36%] 2025-12-04T14:00:07.9681373Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bfloat16 PASSED [0.0556s] [ 36%] 2025-12-04T14:00:07.9681691Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bool PASSED [0.0468s] [ 36%] 2025-12-04T14:00:07.9682081Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex128 PASSED [0.0576s] [ 36%] 2025-12-04T14:00:07.9682417Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex64 PASSED [0.0578s] [ 36%] 2025-12-04T14:00:07.9682749Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float16 PASSED [0.0558s] [ 36%] 2025-12-04T14:00:07.9683071Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float32 PASSED [0.0556s] [ 37%] 2025-12-04T14:00:07.9683399Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float64 PASSED [0.0557s] [ 37%] 2025-12-04T14:00:07.9683721Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int16 PASSED [0.0468s] [ 37%] 2025-12-04T14:00:07.9684040Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int32 PASSED [0.0468s] [ 37%] 2025-12-04T14:00:07.9684365Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int64 PASSED [0.0470s] [ 37%] 2025-12-04T14:00:07.9684684Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int8 PASSED [0.0467s] [ 37%] 2025-12-04T14:00:07.9685001Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_uint8 PASSED [0.0468s] [ 37%] 2025-12-04T14:00:07.9685341Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bfloat16 PASSED [0.0558s] [ 37%] 2025-12-04T14:00:07.9685657Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bool PASSED [0.0468s] [ 37%] 2025-12-04T14:00:07.9685999Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex128 PASSED [0.0575s] [ 37%] 2025-12-04T14:00:07.9686337Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex64 PASSED [0.0580s] [ 37%] 2025-12-04T14:00:07.9686666Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float16 PASSED [0.0556s] [ 37%] 2025-12-04T14:00:07.9687000Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float32 PASSED [0.0556s] [ 37%] 2025-12-04T14:00:07.9687325Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float64 PASSED [0.0555s] [ 38%] 2025-12-04T14:00:07.9687642Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int16 PASSED [0.0468s] [ 38%] 2025-12-04T14:00:07.9688018Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int32 PASSED [0.0468s] [ 38%] 2025-12-04T14:00:07.9688375Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int64 PASSED [0.0475s] [ 38%] 2025-12-04T14:00:07.9688726Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int8 PASSED [0.0470s] [ 38%] 2025-12-04T14:00:07.9689069Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_uint8 PASSED [0.0470s] [ 38%] 2025-12-04T14:00:07.9689403Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bfloat16 PASSED [0.0554s] [ 38%] 2025-12-04T14:00:07.9689728Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bool PASSED [0.0467s] [ 38%] 2025-12-04T14:00:07.9690071Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex128 PASSED [0.0576s] [ 38%] 2025-12-04T14:00:07.9690414Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex64 PASSED [0.0579s] [ 38%] 2025-12-04T14:00:07.9690744Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float16 PASSED [0.0554s] [ 38%] 2025-12-04T14:00:07.9691073Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float32 PASSED [0.0555s] [ 38%] 2025-12-04T14:00:07.9691400Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float64 PASSED [0.0555s] [ 38%] 2025-12-04T14:00:07.9691719Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int16 PASSED [0.0466s] [ 38%] 2025-12-04T14:00:07.9692115Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int32 PASSED [0.0467s] [ 39%] 2025-12-04T14:00:07.9692474Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int64 PASSED [0.0469s] [ 39%] 2025-12-04T14:00:07.9692793Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int8 PASSED [0.0466s] [ 39%] 2025-12-04T14:00:07.9693116Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_uint8 PASSED [0.0467s] [ 39%] 2025-12-04T14:00:07.9693446Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bfloat16 PASSED [0.0410s] [ 39%] 2025-12-04T14:00:07.9693767Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bool PASSED [0.0321s] [ 39%] 2025-12-04T14:00:07.9694118Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex128 PASSED [0.0430s] [ 39%] 2025-12-04T14:00:07.9694450Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex64 PASSED [0.0435s] [ 39%] 2025-12-04T14:00:07.9694781Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float16 PASSED [0.0410s] [ 39%] 2025-12-04T14:00:07.9695107Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float32 PASSED [0.0409s] [ 39%] 2025-12-04T14:00:07.9695432Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float64 PASSED [0.0409s] [ 39%] 2025-12-04T14:00:07.9695757Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int16 PASSED [0.0322s] [ 39%] 2025-12-04T14:00:07.9696080Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int32 PASSED [0.0321s] [ 39%] 2025-12-04T14:00:07.9696411Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int64 PASSED [0.0325s] [ 39%] 2025-12-04T14:00:07.9696727Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int8 PASSED [0.0321s] [ 40%] 2025-12-04T14:00:07.9697045Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_uint8 PASSED [0.0321s] [ 40%] 2025-12-04T14:00:07.9697382Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bfloat16 PASSED [0.0404s] [ 40%] 2025-12-04T14:00:07.9697705Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bool PASSED [0.0316s] [ 40%] 2025-12-04T14:00:07.9698051Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex128 PASSED [0.0424s] [ 40%] 2025-12-04T14:00:07.9698385Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex64 PASSED [0.0430s] [ 40%] 2025-12-04T14:00:07.9698863Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float16 PASSED [0.0405s] [ 40%] 2025-12-04T14:00:07.9699302Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float32 PASSED [0.0403s] [ 40%] 2025-12-04T14:00:07.9704294Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float64 PASSED [0.0403s] [ 40%] 2025-12-04T14:00:07.9704631Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int16 PASSED [0.0316s] [ 40%] 2025-12-04T14:00:07.9704955Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int32 PASSED [0.0316s] [ 40%] 2025-12-04T14:00:07.9705272Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int64 PASSED [0.0319s] [ 40%] 2025-12-04T14:00:07.9705588Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int8 PASSED [0.0315s] [ 40%] 2025-12-04T14:00:07.9705910Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_uint8 PASSED [0.0316s] [ 40%] 2025-12-04T14:00:07.9706237Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bfloat16 PASSED [0.0532s] [ 41%] 2025-12-04T14:00:07.9706560Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bool PASSED [0.0449s] [ 41%] 2025-12-04T14:00:07.9706904Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex128 PASSED [0.0551s] [ 41%] 2025-12-04T14:00:07.9707318Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex64 PASSED [0.0555s] [ 41%] 2025-12-04T14:00:07.9707648Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float16 PASSED [0.0532s] [ 41%] 2025-12-04T14:00:07.9708323Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float32 PASSED [0.0532s] [ 41%] 2025-12-04T14:00:07.9708663Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float64 PASSED [0.0532s] [ 41%] 2025-12-04T14:00:07.9708983Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int16 PASSED [0.0446s] [ 41%] 2025-12-04T14:00:07.9709297Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int32 PASSED [0.0444s] [ 41%] 2025-12-04T14:00:07.9709617Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int64 PASSED [0.0447s] [ 41%] 2025-12-04T14:00:07.9709929Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int8 PASSED [0.0446s] [ 41%] 2025-12-04T14:00:07.9710244Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_uint8 PASSED [0.0446s] [ 41%] 2025-12-04T14:00:07.9710577Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bfloat16 PASSED [0.0531s] [ 41%] 2025-12-04T14:00:07.9710892Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bool PASSED [0.0446s] [ 41%] 2025-12-04T14:00:07.9711235Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex128 PASSED [0.0552s] [ 42%] 2025-12-04T14:00:07.9711570Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex64 PASSED [0.0557s] [ 42%] 2025-12-04T14:00:07.9711893Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float16 PASSED [0.0533s] [ 42%] 2025-12-04T14:00:07.9712220Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float32 PASSED [0.0531s] [ 42%] 2025-12-04T14:00:07.9712543Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float64 PASSED [0.0530s] [ 42%] 2025-12-04T14:00:07.9712862Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int16 PASSED [0.0445s] [ 42%] 2025-12-04T14:00:07.9713177Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int32 PASSED [0.0446s] [ 42%] 2025-12-04T14:00:07.9713492Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int64 PASSED [0.0449s] [ 42%] 2025-12-04T14:00:07.9713811Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int8 PASSED [0.0445s] [ 42%] 2025-12-04T14:00:07.9714222Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_uint8 PASSED [0.0445s] [ 42%] 2025-12-04T14:00:07.9714612Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bfloat16 PASSED [0.0532s] [ 42%] 2025-12-04T14:00:07.9714927Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bool PASSED [0.0445s] [ 42%] 2025-12-04T14:00:07.9715265Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex128 PASSED [0.0552s] [ 42%] 2025-12-04T14:00:07.9715606Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex64 PASSED [0.0557s] [ 42%] 2025-12-04T14:00:07.9715929Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float16 PASSED [0.0532s] [ 43%] 2025-12-04T14:00:07.9716254Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float32 PASSED [0.0528s] [ 43%] 2025-12-04T14:00:07.9716574Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float64 PASSED [0.0532s] [ 43%] 2025-12-04T14:00:07.9716892Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int16 PASSED [0.0446s] [ 43%] 2025-12-04T14:00:07.9717212Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int32 PASSED [0.0445s] [ 43%] 2025-12-04T14:00:07.9717526Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int64 PASSED [0.0452s] [ 43%] 2025-12-04T14:00:07.9717839Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int8 PASSED [0.0443s] [ 43%] 2025-12-04T14:00:07.9718220Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_uint8 PASSED [0.0445s] [ 43%] 2025-12-04T14:00:07.9718602Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bfloat16 PASSED [0.0532s] [ 43%] 2025-12-04T14:00:07.9718918Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bool PASSED [0.0445s] [ 43%] 2025-12-04T14:00:07.9719257Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex128 PASSED [0.0551s] [ 43%] 2025-12-04T14:00:07.9719594Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex64 PASSED [0.0556s] [ 43%] 2025-12-04T14:00:07.9719920Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float16 PASSED [0.0532s] [ 43%] 2025-12-04T14:00:07.9720243Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float32 PASSED [0.0531s] [ 43%] 2025-12-04T14:00:07.9720567Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float64 PASSED [0.0531s] [ 44%] 2025-12-04T14:00:07.9720884Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int16 PASSED [0.0446s] [ 44%] 2025-12-04T14:00:07.9721201Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int32 PASSED [0.0445s] [ 44%] 2025-12-04T14:00:07.9721517Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int64 PASSED [0.0448s] [ 44%] 2025-12-04T14:00:07.9721833Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int8 PASSED [0.0446s] [ 44%] 2025-12-04T14:00:07.9722150Z test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_uint8 PASSED [0.0446s] [ 44%] 2025-12-04T14:00:07.9722533Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bfloat16 PASSED [0.0724s] [ 44%] 2025-12-04T14:00:07.9722898Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bool PASSED [0.0555s] [ 44%] 2025-12-04T14:00:07.9723296Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex128 PASSED [0.0764s] [ 44%] 2025-12-04T14:00:07.9723681Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex64 PASSED [0.0768s] [ 44%] 2025-12-04T14:00:07.9724061Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float16 PASSED [0.0722s] [ 44%] 2025-12-04T14:00:07.9724434Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float32 PASSED [0.0724s] [ 44%] 2025-12-04T14:00:07.9724856Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float64 PASSED [0.0724s] [ 44%] 2025-12-04T14:00:07.9725264Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int16 PASSED [0.0555s] [ 44%] 2025-12-04T14:00:07.9725632Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int32 PASSED [0.0555s] [ 45%] 2025-12-04T14:00:07.9726001Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int64 PASSED [0.0559s] [ 45%] 2025-12-04T14:00:07.9726364Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int8 PASSED [0.0553s] [ 45%] 2025-12-04T14:00:07.9726735Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_uint8 PASSED [0.0553s] [ 45%] 2025-12-04T14:00:07.9727114Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bfloat16 PASSED [0.0721s] [ 45%] 2025-12-04T14:00:07.9727477Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bool PASSED [0.0553s] [ 45%] 2025-12-04T14:00:07.9727871Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex128 PASSED [0.0761s] [ 45%] 2025-12-04T14:00:07.9728256Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex64 PASSED [0.0767s] [ 45%] 2025-12-04T14:00:07.9728639Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float16 PASSED [0.0723s] [ 45%] 2025-12-04T14:00:07.9729103Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float32 PASSED [0.0722s] [ 45%] 2025-12-04T14:00:07.9729516Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float64 PASSED [0.0722s] [ 45%] 2025-12-04T14:00:07.9729879Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int16 PASSED [0.0554s] [ 45%] 2025-12-04T14:00:07.9730249Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int32 PASSED [0.0554s] [ 45%] 2025-12-04T14:00:07.9730613Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int64 PASSED [0.0556s] [ 45%] 2025-12-04T14:00:07.9730980Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int8 PASSED [0.0552s] [ 46%] 2025-12-04T14:00:07.9731344Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_uint8 PASSED [0.0553s] [ 46%] 2025-12-04T14:00:07.9731724Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bfloat16 PASSED [0.1153s] [ 46%] 2025-12-04T14:00:07.9732092Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bool PASSED [0.0987s] [ 46%] 2025-12-04T14:00:07.9732481Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex128 PASSED [0.1196s] [ 46%] 2025-12-04T14:00:07.9732869Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex64 PASSED [0.1202s] [ 46%] 2025-12-04T14:00:07.9733243Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float16 PASSED [0.1159s] [ 46%] 2025-12-04T14:00:07.9733617Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float32 PASSED [0.1157s] [ 46%] 2025-12-04T14:00:07.9733990Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float64 PASSED [0.1154s] [ 46%] 2025-12-04T14:00:07.9734356Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int16 PASSED [0.0996s] [ 46%] 2025-12-04T14:00:07.9734723Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int32 PASSED [0.0995s] [ 46%] 2025-12-04T14:00:07.9735093Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int64 PASSED [0.1001s] [ 46%] 2025-12-04T14:00:07.9735455Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int8 PASSED [0.0994s] [ 46%] 2025-12-04T14:00:07.9735867Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_uint8 PASSED [0.0994s] [ 46%] 2025-12-04T14:00:07.9736313Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bfloat16 PASSED [0.1154s] [ 47%] 2025-12-04T14:00:07.9736680Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bool PASSED [0.0988s] [ 47%] 2025-12-04T14:00:07.9737072Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex128 PASSED [0.1191s] [ 47%] 2025-12-04T14:00:07.9737455Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex64 PASSED [0.1192s] [ 47%] 2025-12-04T14:00:07.9737833Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float16 PASSED [0.1151s] [ 47%] 2025-12-04T14:00:07.9738204Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float32 PASSED [0.1150s] [ 47%] 2025-12-04T14:00:07.9738588Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float64 PASSED [0.1150s] [ 47%] 2025-12-04T14:00:07.9739003Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int16 PASSED [0.0987s] [ 47%] 2025-12-04T14:00:07.9739435Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int32 PASSED [0.0988s] [ 47%] 2025-12-04T14:00:07.9739804Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int64 PASSED [0.0989s] [ 47%] 2025-12-04T14:00:07.9740212Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int8 PASSED [0.0987s] [ 47%] 2025-12-04T14:00:07.9740622Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_uint8 PASSED [0.0988s] [ 47%] 2025-12-04T14:00:07.9740998Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bfloat16 PASSED [0.1080s] [ 47%] 2025-12-04T14:00:07.9741365Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bool PASSED [0.0900s] [ 47%] 2025-12-04T14:00:07.9741758Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex128 PASSED [0.1096s] [ 48%] 2025-12-04T14:00:07.9742146Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex64 PASSED [0.1099s] [ 48%] 2025-12-04T14:00:07.9742522Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float16 PASSED [0.1062s] [ 48%] 2025-12-04T14:00:07.9742895Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float32 PASSED [0.1062s] [ 48%] 2025-12-04T14:00:07.9743269Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float64 PASSED [0.1058s] [ 48%] 2025-12-04T14:00:07.9743636Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int16 PASSED [0.0901s] [ 48%] 2025-12-04T14:00:07.9743999Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int32 PASSED [0.0898s] [ 48%] 2025-12-04T14:00:07.9744369Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int64 PASSED [0.0907s] [ 48%] 2025-12-04T14:00:07.9744734Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int8 PASSED [0.0901s] [ 48%] 2025-12-04T14:00:07.9745098Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_uint8 PASSED [0.0896s] [ 48%] 2025-12-04T14:00:07.9745483Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bfloat16 PASSED [0.1058s] [ 48%] 2025-12-04T14:00:07.9745844Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bool PASSED [0.0897s] [ 48%] 2025-12-04T14:00:07.9746247Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex128 PASSED [0.1092s] [ 48%] 2025-12-04T14:00:07.9746632Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex64 PASSED [0.1098s] [ 48%] 2025-12-04T14:00:07.9747051Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float16 PASSED [0.1055s] [ 49%] 2025-12-04T14:00:07.9747468Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float32 PASSED [0.1053s] [ 49%] 2025-12-04T14:00:07.9747842Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float64 PASSED [0.1051s] [ 49%] 2025-12-04T14:00:07.9748210Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int16 PASSED [0.0890s] [ 49%] 2025-12-04T14:00:07.9748577Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int32 PASSED [0.0889s] [ 49%] 2025-12-04T14:00:07.9748944Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int64 PASSED [0.0897s] [ 49%] 2025-12-04T14:00:07.9749312Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int8 PASSED [0.0891s] [ 49%] 2025-12-04T14:00:07.9749680Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_uint8 PASSED [0.0897s] [ 49%] 2025-12-04T14:00:07.9750119Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bfloat16 SKIPPED [0.0028s] (NOT IMPL) [ 49%] 2025-12-04T14:00:07.9750535Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 49%] 2025-12-04T14:00:07.9751017Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 49%] 2025-12-04T14:00:07.9751458Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 49%] 2025-12-04T14:00:07.9751928Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float16 SKIPPED [0.0026s] (NOT IMPL) [ 49%] 2025-12-04T14:00:07.9752355Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9752780Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9753198Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9753616Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9754034Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int64 SKIPPED [0.0025s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9754458Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9754874Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9755305Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bfloat16 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9755723Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9756162Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex128 SKIPPED [0.0025s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9756598Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9757023Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9757448Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9757873Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 50%] 2025-12-04T14:00:07.9758337Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int16 SKIPPED [0.0024s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9758798Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9759213Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9759629Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9760047Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9760477Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bfloat16 SKIPPED [0.0025s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9760891Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9761329Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9761771Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9762196Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9762664Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float32 SKIPPED [0.0025s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9763129Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9763543Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9763957Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 51%] 2025-12-04T14:00:07.9764374Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9764789Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int8 SKIPPED [0.0024s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9765206Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9765636Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bfloat16 SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9766049Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9766490Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9766923Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex64 SKIPPED [0.0024s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9767351Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float16 SKIPPED [0.0020s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9767775Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9768199Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float64 SKIPPED [0.0020s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9768616Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9769031Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int32 SKIPPED [0.0024s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9769445Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9769903Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 52%] 2025-12-04T14:00:07.9770359Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 53%] 2025-12-04T14:00:07.9770742Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bfloat16 PASSED [0.1164s] [ 53%] 2025-12-04T14:00:07.9771108Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bool PASSED [0.0996s] [ 53%] 2025-12-04T14:00:07.9771501Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex128 PASSED [0.1191s] [ 53%] 2025-12-04T14:00:07.9771889Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex64 PASSED [0.1196s] [ 53%] 2025-12-04T14:00:07.9772260Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float16 PASSED [0.1161s] [ 53%] 2025-12-04T14:00:07.9772640Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float32 PASSED [0.1160s] [ 53%] 2025-12-04T14:00:07.9773013Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float64 PASSED [0.1163s] [ 53%] 2025-12-04T14:00:07.9773380Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int16 PASSED [0.0995s] [ 53%] 2025-12-04T14:00:07.9773747Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int32 PASSED [0.0995s] [ 53%] 2025-12-04T14:00:07.9774235Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int64 PASSED [0.0997s] [ 53%] 2025-12-04T14:00:07.9774654Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int8 PASSED [0.0999s] [ 53%] 2025-12-04T14:00:07.9775020Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_uint8 PASSED [0.0997s] [ 53%] 2025-12-04T14:00:07.9775402Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bfloat16 PASSED [0.1157s] [ 53%] 2025-12-04T14:00:07.9775766Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bool PASSED [0.0987s] [ 54%] 2025-12-04T14:00:07.9776157Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex128 PASSED [0.1191s] [ 54%] 2025-12-04T14:00:07.9776545Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex64 PASSED [0.1188s] [ 54%] 2025-12-04T14:00:07.9776918Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float16 PASSED [0.1150s] [ 54%] 2025-12-04T14:00:07.9777294Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float32 PASSED [0.1153s] [ 54%] 2025-12-04T14:00:07.9777664Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float64 PASSED [0.1157s] [ 54%] 2025-12-04T14:00:07.9778029Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int16 PASSED [0.0987s] [ 54%] 2025-12-04T14:00:07.9778396Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int32 PASSED [0.0989s] [ 54%] 2025-12-04T14:00:07.9778762Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int64 PASSED [0.0985s] [ 54%] 2025-12-04T14:00:07.9779182Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int8 PASSED [0.0988s] [ 54%] 2025-12-04T14:00:07.9779550Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_uint8 PASSED [0.0987s] [ 54%] 2025-12-04T14:00:07.9779930Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bfloat16 PASSED [0.0728s] [ 54%] 2025-12-04T14:00:07.9780294Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bool PASSED [0.0553s] [ 54%] 2025-12-04T14:00:07.9780685Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex128 PASSED [0.0763s] [ 54%] 2025-12-04T14:00:07.9781149Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex64 PASSED [0.0762s] [ 55%] 2025-12-04T14:00:07.9781561Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float16 PASSED [0.0723s] [ 55%] 2025-12-04T14:00:07.9781938Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float32 PASSED [0.0724s] [ 55%] 2025-12-04T14:00:07.9782314Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float64 PASSED [0.0727s] [ 55%] 2025-12-04T14:00:07.9782678Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int16 PASSED [0.0553s] [ 55%] 2025-12-04T14:00:07.9783051Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int32 PASSED [0.0553s] [ 55%] 2025-12-04T14:00:07.9783414Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int64 PASSED [0.0555s] [ 55%] 2025-12-04T14:00:07.9783777Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int8 PASSED [0.0554s] [ 55%] 2025-12-04T14:00:07.9784145Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_uint8 PASSED [0.0554s] [ 55%] 2025-12-04T14:00:07.9784523Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bfloat16 PASSED [0.0729s] [ 55%] 2025-12-04T14:00:07.9784884Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bool PASSED [0.0555s] [ 55%] 2025-12-04T14:00:07.9785315Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex128 PASSED [0.0763s] [ 55%] 2025-12-04T14:00:07.9785736Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex64 PASSED [0.0761s] [ 55%] 2025-12-04T14:00:07.9786113Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float16 PASSED [0.0722s] [ 55%] 2025-12-04T14:00:07.9786486Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float32 PASSED [0.0723s] [ 56%] 2025-12-04T14:00:07.9786859Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float64 PASSED [0.0726s] [ 56%] 2025-12-04T14:00:07.9787226Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int16 PASSED [0.0552s] [ 56%] 2025-12-04T14:00:07.9787589Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int32 PASSED [0.0554s] [ 56%] 2025-12-04T14:00:07.9787957Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int64 PASSED [0.0552s] [ 56%] 2025-12-04T14:00:07.9788321Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int8 PASSED [0.0551s] [ 56%] 2025-12-04T14:00:07.9788736Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_uint8 PASSED [0.0552s] [ 56%] 2025-12-04T14:00:07.9789117Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bfloat16 PASSED [0.1079s] [ 56%] 2025-12-04T14:00:07.9789479Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bool PASSED [0.0915s] [ 56%] 2025-12-04T14:00:07.9789872Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex128 PASSED [0.1112s] [ 56%] 2025-12-04T14:00:07.9790256Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex64 PASSED [0.1113s] [ 56%] 2025-12-04T14:00:07.9790628Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float16 PASSED [0.1079s] [ 56%] 2025-12-04T14:00:07.9791003Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float32 PASSED [0.1077s] [ 56%] 2025-12-04T14:00:07.9791373Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float64 PASSED [0.1082s] [ 56%] 2025-12-04T14:00:07.9791738Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int16 PASSED [0.0917s] [ 57%] 2025-12-04T14:00:07.9792146Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int32 PASSED [0.0916s] [ 57%] 2025-12-04T14:00:07.9792550Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int64 PASSED [0.0917s] [ 57%] 2025-12-04T14:00:07.9792915Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int8 PASSED [0.0916s] [ 57%] 2025-12-04T14:00:07.9793281Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_uint8 PASSED [0.0918s] [ 57%] 2025-12-04T14:00:07.9793659Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bfloat16 PASSED [0.1081s] [ 57%] 2025-12-04T14:00:07.9794023Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bool PASSED [0.0912s] [ 57%] 2025-12-04T14:00:07.9794411Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex128 PASSED [0.1110s] [ 57%] 2025-12-04T14:00:07.9794799Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex64 PASSED [0.1102s] [ 57%] 2025-12-04T14:00:07.9795172Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float16 PASSED [0.1067s] [ 57%] 2025-12-04T14:00:07.9795548Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float32 PASSED [0.1066s] [ 57%] 2025-12-04T14:00:07.9795960Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float64 PASSED [0.1076s] [ 57%] 2025-12-04T14:00:07.9796326Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int16 PASSED [0.0914s] [ 57%] 2025-12-04T14:00:07.9796737Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int32 PASSED [0.0912s] [ 57%] 2025-12-04T14:00:07.9797099Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int64 PASSED [0.0907s] [ 58%] 2025-12-04T14:00:07.9797466Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int8 PASSED [0.0914s] [ 58%] 2025-12-04T14:00:07.9797833Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_uint8 PASSED [0.0909s] [ 58%] 2025-12-04T14:00:07.9798264Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bfloat16 SKIPPED [0.0023s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9798720Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bool SKIPPED [0.0030s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9799174Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9799614Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9800037Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9800461Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9800888Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float64 SKIPPED [0.0025s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9801301Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9801719Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9802131Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9802546Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 58%] 2025-12-04T14:00:07.9802963Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_uint8 SKIPPED [0.0025s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9803435Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bfloat16 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9803889Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9804328Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex128 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9804763Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9805190Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float16 SKIPPED [0.0025s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9805614Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9806041Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9806457Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9806870Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9807287Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int64 SKIPPED [0.0024s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9808518Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9809058Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9809488Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bfloat16 SKIPPED [0.0021s] (NOT IMPL) [ 59%] 2025-12-04T14:00:07.9809903Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9810343Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex128 SKIPPED [0.0024s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9810775Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9811203Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9811625Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float32 SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9812053Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9812473Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int16 SKIPPED [0.0025s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9812890Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9813307Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int64 SKIPPED [0.0020s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9813718Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int8 SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9814136Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9814570Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bfloat16 SKIPPED [0.0024s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9814982Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bool SKIPPED [0.0021s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9815514Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex128 SKIPPED [0.0020s] (NOT IMPL) [ 60%] 2025-12-04T14:00:07.9816031Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex64 SKIPPED [0.0021s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9816487Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float16 SKIPPED [0.0021s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9816942Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float32 SKIPPED [0.0024s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9817397Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float64 SKIPPED [0.0021s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9817845Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int16 SKIPPED [0.0021s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9818289Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int32 SKIPPED [0.0021s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9818735Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int64 SKIPPED [0.0021s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9819229Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int8 SKIPPED [0.0025s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9819647Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_uint8 SKIPPED [0.0021s] (NOT IMPL) [ 61%] 2025-12-04T14:00:07.9820099Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bfloat16 PASSED [0.0717s] [ 61%] 2025-12-04T14:00:07.9820467Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bool PASSED [0.0633s] [ 61%] 2025-12-04T14:00:07.9820928Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex128 PASSED [0.0732s] [ 61%] 2025-12-04T14:00:07.9821327Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex64 PASSED [0.0737s] [ 61%] 2025-12-04T14:00:07.9821708Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float16 PASSED [0.0715s] [ 61%] 2025-12-04T14:00:07.9822093Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float32 PASSED [0.0715s] [ 62%] 2025-12-04T14:00:07.9822467Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float64 PASSED [0.0713s] [ 62%] 2025-12-04T14:00:07.9822838Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int16 PASSED [0.0630s] [ 62%] 2025-12-04T14:00:07.9823218Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int32 PASSED [0.0630s] [ 62%] 2025-12-04T14:00:07.9823590Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int64 PASSED [0.0636s] [ 62%] 2025-12-04T14:00:07.9823967Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int8 PASSED [0.0630s] [ 62%] 2025-12-04T14:00:07.9824336Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_uint8 PASSED [0.0631s] [ 62%] 2025-12-04T14:00:07.9824724Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bfloat16 PASSED [0.0708s] [ 62%] 2025-12-04T14:00:07.9825105Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bool PASSED [0.0626s] [ 62%] 2025-12-04T14:00:07.9825509Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex128 PASSED [0.0727s] [ 62%] 2025-12-04T14:00:07.9825904Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex64 PASSED [0.0735s] [ 62%] 2025-12-04T14:00:07.9826280Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float16 PASSED [0.0708s] [ 62%] 2025-12-04T14:00:07.9826656Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float32 PASSED [0.0708s] [ 62%] 2025-12-04T14:00:07.9827038Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float64 PASSED [0.0708s] [ 63%] 2025-12-04T14:00:07.9827453Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int16 PASSED [0.0623s] [ 63%] 2025-12-04T14:00:07.9827870Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int32 PASSED [0.0625s] [ 63%] 2025-12-04T14:00:07.9828239Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int64 PASSED [0.0631s] [ 63%] 2025-12-04T14:00:07.9828605Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int8 PASSED [0.0624s] [ 63%] 2025-12-04T14:00:07.9828978Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_uint8 PASSED [0.0627s] [ 63%] 2025-12-04T14:00:07.9829362Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bfloat16 PASSED [0.0679s] [ 63%] 2025-12-04T14:00:07.9829731Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bool PASSED [0.0596s] [ 63%] 2025-12-04T14:00:07.9830129Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex128 PASSED [0.0697s] [ 63%] 2025-12-04T14:00:07.9830519Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex64 PASSED [0.0701s] [ 63%] 2025-12-04T14:00:07.9830903Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float16 PASSED [0.0676s] [ 63%] 2025-12-04T14:00:07.9831319Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float32 PASSED [0.0677s] [ 63%] 2025-12-04T14:00:07.9831701Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float64 PASSED [0.0676s] [ 63%] 2025-12-04T14:00:07.9832107Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int16 PASSED [0.0595s] [ 63%] 2025-12-04T14:00:07.9832478Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int32 PASSED [0.0593s] [ 64%] 2025-12-04T14:00:07.9832855Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int64 PASSED [0.0599s] [ 64%] 2025-12-04T14:00:07.9833223Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int8 PASSED [0.0594s] [ 64%] 2025-12-04T14:00:07.9833600Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_uint8 PASSED [0.0594s] [ 64%] 2025-12-04T14:00:07.9833979Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bfloat16 PASSED [0.0672s] [ 64%] 2025-12-04T14:00:07.9834345Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bool PASSED [0.0590s] [ 64%] 2025-12-04T14:00:07.9834746Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex128 PASSED [0.0689s] [ 64%] 2025-12-04T14:00:07.9835134Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex64 PASSED [0.0692s] [ 64%] 2025-12-04T14:00:07.9835515Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float16 PASSED [0.0671s] [ 64%] 2025-12-04T14:00:07.9835894Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float32 PASSED [0.0673s] [ 64%] 2025-12-04T14:00:07.9836269Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float64 PASSED [0.0671s] [ 64%] 2025-12-04T14:00:07.9836648Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int16 PASSED [0.0590s] [ 64%] 2025-12-04T14:00:07.9837020Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int32 PASSED [0.0590s] [ 64%] 2025-12-04T14:00:07.9837393Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int64 PASSED [0.0593s] [ 64%] 2025-12-04T14:00:07.9837759Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int8 PASSED [0.0589s] [ 65%] 2025-12-04T14:00:07.9838125Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_uint8 PASSED [0.0590s] [ 65%] 2025-12-04T14:00:07.9838562Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bfloat16 PASSED [0.0618s] [ 65%] 2025-12-04T14:00:07.9839015Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bool PASSED [0.0470s] [ 65%] 2025-12-04T14:00:07.9839413Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex128 PASSED [0.0652s] [ 65%] 2025-12-04T14:00:07.9839802Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex64 PASSED [0.0657s] [ 65%] 2025-12-04T14:00:07.9840177Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float16 PASSED [0.0618s] [ 65%] 2025-12-04T14:00:07.9840558Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float32 PASSED [0.0617s] [ 65%] 2025-12-04T14:00:07.9840931Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float64 PASSED [0.0617s] [ 65%] 2025-12-04T14:00:07.9841307Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int16 PASSED [0.0470s] [ 65%] 2025-12-04T14:00:07.9841677Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int32 PASSED [0.0468s] [ 65%] 2025-12-04T14:00:07.9842043Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int64 PASSED [0.0472s] [ 65%] 2025-12-04T14:00:07.9842466Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int8 PASSED [0.0467s] [ 65%] 2025-12-04T14:00:07.9842835Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_uint8 PASSED [0.0469s] [ 65%] 2025-12-04T14:00:07.9843262Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bfloat16 PASSED [0.0614s] [ 66%] 2025-12-04T14:00:07.9843631Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bool PASSED [0.0462s] [ 66%] 2025-12-04T14:00:07.9844030Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex128 PASSED [0.0647s] [ 66%] 2025-12-04T14:00:07.9844422Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex64 PASSED [0.0652s] [ 66%] 2025-12-04T14:00:07.9844797Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float16 PASSED [0.0613s] [ 66%] 2025-12-04T14:00:07.9845172Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float32 PASSED [0.0613s] [ 66%] 2025-12-04T14:00:07.9845555Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float64 PASSED [0.0612s] [ 66%] 2025-12-04T14:00:07.9845923Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int16 PASSED [0.0462s] [ 66%] 2025-12-04T14:00:07.9846299Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int32 PASSED [0.0464s] [ 66%] 2025-12-04T14:00:07.9846674Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int64 PASSED [0.0467s] [ 66%] 2025-12-04T14:00:07.9847039Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int8 PASSED [0.0461s] [ 66%] 2025-12-04T14:00:07.9847419Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_uint8 PASSED [0.0462s] [ 66%] 2025-12-04T14:00:07.9847800Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bfloat16 PASSED [0.0630s] [ 66%] 2025-12-04T14:00:07.9848173Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bool PASSED [0.0548s] [ 66%] 2025-12-04T14:00:07.9848572Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex128 PASSED [0.0647s] [ 67%] 2025-12-04T14:00:07.9848958Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex64 PASSED [0.0653s] [ 67%] 2025-12-04T14:00:07.9849341Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float16 PASSED [0.0630s] [ 67%] 2025-12-04T14:00:07.9849761Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float32 PASSED [0.0630s] [ 67%] 2025-12-04T14:00:07.9850179Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float64 PASSED [0.0629s] [ 67%] 2025-12-04T14:00:07.9850546Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int16 PASSED [0.0548s] [ 67%] 2025-12-04T14:00:07.9850913Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int32 PASSED [0.0548s] [ 67%] 2025-12-04T14:00:07.9851288Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int64 PASSED [0.0555s] [ 67%] 2025-12-04T14:00:07.9851656Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int8 PASSED [0.0547s] [ 67%] 2025-12-04T14:00:07.9852026Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_uint8 PASSED [0.0547s] [ 67%] 2025-12-04T14:00:07.9852408Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bfloat16 PASSED [0.0625s] [ 67%] 2025-12-04T14:00:07.9852775Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bool PASSED [0.0542s] [ 67%] 2025-12-04T14:00:07.9853171Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex128 PASSED [0.0642s] [ 67%] 2025-12-04T14:00:07.9853599Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex64 PASSED [0.0650s] [ 67%] 2025-12-04T14:00:07.9853978Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float16 PASSED [0.0625s] [ 68%] 2025-12-04T14:00:07.9854531Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float32 PASSED [0.0625s] [ 68%] 2025-12-04T14:00:07.9854906Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float64 PASSED [0.0625s] [ 68%] 2025-12-04T14:00:07.9855281Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int16 PASSED [0.0542s] [ 68%] 2025-12-04T14:00:07.9855651Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int32 PASSED [0.0542s] [ 68%] 2025-12-04T14:00:07.9856022Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int64 PASSED [0.0545s] [ 68%] 2025-12-04T14:00:07.9856385Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int8 PASSED [0.0541s] [ 68%] 2025-12-04T14:00:07.9856755Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_uint8 PASSED [0.0541s] [ 68%] 2025-12-04T14:00:07.9857146Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bfloat16 PASSED [0.0593s] [ 68%] 2025-12-04T14:00:07.9857509Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bool PASSED [0.0508s] [ 68%] 2025-12-04T14:00:07.9857908Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex128 PASSED [0.0610s] [ 68%] 2025-12-04T14:00:07.9858296Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex64 PASSED [0.0614s] [ 68%] 2025-12-04T14:00:07.9858699Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float16 PASSED [0.0593s] [ 68%] 2025-12-04T14:00:07.9859148Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float32 PASSED [0.0592s] [ 68%] 2025-12-04T14:00:07.9859526Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float64 PASSED [0.0591s] [ 69%] 2025-12-04T14:00:07.9859901Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int16 PASSED [0.0510s] [ 69%] 2025-12-04T14:00:07.9860268Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int32 PASSED [0.0509s] [ 69%] 2025-12-04T14:00:07.9860633Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int64 PASSED [0.0513s] [ 69%] 2025-12-04T14:00:07.9861050Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int8 PASSED [0.0509s] [ 69%] 2025-12-04T14:00:07.9861460Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_uint8 PASSED [0.0510s] [ 69%] 2025-12-04T14:00:07.9861844Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bfloat16 PASSED [0.0588s] [ 69%] 2025-12-04T14:00:07.9862210Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bool PASSED [0.0505s] [ 69%] 2025-12-04T14:00:07.9862604Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex128 PASSED [0.0605s] [ 69%] 2025-12-04T14:00:07.9862999Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex64 PASSED [0.0610s] [ 69%] 2025-12-04T14:00:07.9863371Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float16 PASSED [0.0587s] [ 69%] 2025-12-04T14:00:07.9863751Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float32 PASSED [0.0587s] [ 69%] 2025-12-04T14:00:07.9864124Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float64 PASSED [0.0586s] [ 69%] 2025-12-04T14:00:07.9864489Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int16 PASSED [0.0505s] [ 69%] 2025-12-04T14:00:07.9864927Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int32 PASSED [0.0504s] [ 70%] 2025-12-04T14:00:07.9865295Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int64 PASSED [0.0508s] [ 70%] 2025-12-04T14:00:07.9865699Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int8 PASSED [0.0504s] [ 70%] 2025-12-04T14:00:07.9866069Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_uint8 PASSED [0.0505s] [ 70%] 2025-12-04T14:00:07.9866450Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bfloat16 PASSED [0.0603s] [ 70%] 2025-12-04T14:00:07.9866824Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bool PASSED [0.0517s] [ 70%] 2025-12-04T14:00:07.9867215Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex128 PASSED [0.0619s] [ 70%] 2025-12-04T14:00:07.9867601Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex64 PASSED [0.0624s] [ 70%] 2025-12-04T14:00:07.9867979Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float16 PASSED [0.0600s] [ 70%] 2025-12-04T14:00:07.9868364Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float32 PASSED [0.0599s] [ 70%] 2025-12-04T14:00:07.9868787Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float64 PASSED [0.0600s] [ 70%] 2025-12-04T14:00:07.9869155Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int16 PASSED [0.0517s] [ 70%] 2025-12-04T14:00:07.9869524Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int32 PASSED [0.0515s] [ 70%] 2025-12-04T14:00:07.9869898Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int64 PASSED [0.0521s] [ 70%] 2025-12-04T14:00:07.9870262Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int8 PASSED [0.0516s] [ 71%] 2025-12-04T14:00:07.9870635Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_uint8 PASSED [0.0517s] [ 71%] 2025-12-04T14:00:07.9871016Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bfloat16 PASSED [0.0599s] [ 71%] 2025-12-04T14:00:07.9871381Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bool PASSED [0.0516s] [ 71%] 2025-12-04T14:00:07.9871781Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex128 PASSED [0.0617s] [ 71%] 2025-12-04T14:00:07.9872219Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex64 PASSED [0.0622s] [ 71%] 2025-12-04T14:00:07.9872642Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float16 PASSED [0.0598s] [ 71%] 2025-12-04T14:00:07.9873016Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float32 PASSED [0.0599s] [ 71%] 2025-12-04T14:00:07.9873391Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float64 PASSED [0.0599s] [ 71%] 2025-12-04T14:00:07.9873761Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int16 PASSED [0.0516s] [ 71%] 2025-12-04T14:00:07.9874129Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int32 PASSED [0.0516s] [ 71%] 2025-12-04T14:00:07.9874499Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int64 PASSED [0.0520s] [ 71%] 2025-12-04T14:00:07.9874866Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int8 PASSED [0.0516s] [ 71%] 2025-12-04T14:00:07.9875230Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_uint8 PASSED [0.0516s] [ 71%] 2025-12-04T14:00:07.9875618Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bfloat16 PASSED [0.0725s] [ 72%] 2025-12-04T14:00:07.9876028Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bool PASSED [0.0642s] [ 72%] 2025-12-04T14:00:07.9876429Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex128 PASSED [0.0743s] [ 72%] 2025-12-04T14:00:07.9876854Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex64 PASSED [0.0748s] [ 72%] 2025-12-04T14:00:07.9877229Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float16 PASSED [0.0724s] [ 72%] 2025-12-04T14:00:07.9877609Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float32 PASSED [0.0724s] [ 72%] 2025-12-04T14:00:07.9877984Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float64 PASSED [0.0724s] [ 72%] 2025-12-04T14:00:07.9878358Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int16 PASSED [0.0642s] [ 72%] 2025-12-04T14:00:07.9878777Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int32 PASSED [0.0640s] [ 72%] 2025-12-04T14:00:07.9879143Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int64 PASSED [0.0644s] [ 72%] 2025-12-04T14:00:07.9879515Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int8 PASSED [0.0641s] [ 72%] 2025-12-04T14:00:07.9879880Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_uint8 PASSED [0.0641s] [ 72%] 2025-12-04T14:00:07.9880273Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bfloat16 PASSED [0.0719s] [ 72%] 2025-12-04T14:00:07.9880638Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bool PASSED [0.0635s] [ 72%] 2025-12-04T14:00:07.9881032Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex128 PASSED [0.0736s] [ 73%] 2025-12-04T14:00:07.9881429Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex64 PASSED [0.0744s] [ 73%] 2025-12-04T14:00:07.9881806Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float16 PASSED [0.0717s] [ 73%] 2025-12-04T14:00:07.9882189Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float32 PASSED [0.0719s] [ 73%] 2025-12-04T14:00:07.9882565Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float64 PASSED [0.0719s] [ 73%] 2025-12-04T14:00:07.9882979Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int16 PASSED [0.0635s] [ 73%] 2025-12-04T14:00:07.9883355Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int32 PASSED [0.0636s] [ 73%] 2025-12-04T14:00:07.9883762Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int64 PASSED [0.0638s] [ 73%] 2025-12-04T14:00:07.9884132Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int8 PASSED [0.0635s] [ 73%] 2025-12-04T14:00:07.9884505Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_uint8 PASSED [0.0634s] [ 73%] 2025-12-04T14:00:07.9884890Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bfloat16 PASSED [0.0810s] [ 73%] 2025-12-04T14:00:07.9885258Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bool PASSED [0.0661s] [ 73%] 2025-12-04T14:00:07.9885653Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex128 PASSED [0.0840s] [ 73%] 2025-12-04T14:00:07.9886047Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex64 PASSED [0.0846s] [ 73%] 2025-12-04T14:00:07.9886427Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float16 PASSED [0.0808s] [ 74%] 2025-12-04T14:00:07.9886802Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float32 PASSED [0.0806s] [ 74%] 2025-12-04T14:00:07.9887224Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float64 PASSED [0.0806s] [ 74%] 2025-12-04T14:00:07.9887596Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int16 PASSED [0.0660s] [ 74%] 2025-12-04T14:00:07.9888005Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int32 PASSED [0.0660s] [ 74%] 2025-12-04T14:00:07.9888380Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int64 PASSED [0.0665s] [ 74%] 2025-12-04T14:00:07.9888748Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int8 PASSED [0.0661s] [ 74%] 2025-12-04T14:00:07.9889123Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_uint8 PASSED [0.0661s] [ 74%] 2025-12-04T14:00:07.9889502Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bfloat16 PASSED [0.0806s] [ 74%] 2025-12-04T14:00:07.9889867Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bool PASSED [0.0658s] [ 74%] 2025-12-04T14:00:07.9890264Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex128 PASSED [0.0832s] [ 74%] 2025-12-04T14:00:07.9890652Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex64 PASSED [0.0841s] [ 74%] 2025-12-04T14:00:07.9891035Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float16 PASSED [0.0806s] [ 74%] 2025-12-04T14:00:07.9891412Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float32 PASSED [0.0807s] [ 75%] 2025-12-04T14:00:07.9891788Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float64 PASSED [0.0808s] [ 75%] 2025-12-04T14:00:07.9892163Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int16 PASSED [0.0662s] [ 75%] 2025-12-04T14:00:07.9892535Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int32 PASSED [0.0660s] [ 75%] 2025-12-04T14:00:07.9892907Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int64 PASSED [0.0664s] [ 75%] 2025-12-04T14:00:07.9893275Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int8 PASSED [0.0655s] [ 75%] 2025-12-04T14:00:07.9893644Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_uint8 PASSED [0.0659s] [ 75%] 2025-12-04T14:00:07.9894029Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bfloat16 PASSED [0.0657s] [ 75%] 2025-12-04T14:00:07.9894438Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bool PASSED [0.0506s] [ 75%] 2025-12-04T14:00:07.9894878Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex128 PASSED [0.0691s] [ 75%] 2025-12-04T14:00:07.9895268Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex64 PASSED [0.0696s] [ 75%] 2025-12-04T14:00:07.9895644Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float16 PASSED [0.0658s] [ 75%] 2025-12-04T14:00:07.9896026Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float32 PASSED [0.0659s] [ 75%] 2025-12-04T14:00:07.9896399Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float64 PASSED [0.0657s] [ 75%] 2025-12-04T14:00:07.9896774Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int16 PASSED [0.0505s] [ 76%] 2025-12-04T14:00:07.9897146Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int32 PASSED [0.0503s] [ 76%] 2025-12-04T14:00:07.9897515Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int64 PASSED [0.0508s] [ 76%] 2025-12-04T14:00:07.9897883Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int8 PASSED [0.0504s] [ 76%] 2025-12-04T14:00:07.9898293Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_uint8 PASSED [0.0506s] [ 76%] 2025-12-04T14:00:07.9898723Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bfloat16 PASSED [0.0657s] [ 76%] 2025-12-04T14:00:07.9899179Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bool PASSED [0.0503s] [ 76%] 2025-12-04T14:00:07.9899576Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex128 PASSED [0.0692s] [ 76%] 2025-12-04T14:00:07.9899973Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex64 PASSED [0.0696s] [ 76%] 2025-12-04T14:00:07.9900349Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float16 PASSED [0.0656s] [ 76%] 2025-12-04T14:00:07.9900729Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float32 PASSED [0.0656s] [ 76%] 2025-12-04T14:00:07.9901106Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float64 PASSED [0.0655s] [ 76%] 2025-12-04T14:00:07.9901472Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int16 PASSED [0.0502s] [ 76%] 2025-12-04T14:00:07.9901846Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int32 PASSED [0.0498s] [ 76%] 2025-12-04T14:00:07.9902210Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int64 PASSED [0.0506s] [ 77%] 2025-12-04T14:00:07.9902580Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int8 PASSED [0.0502s] [ 77%] 2025-12-04T14:00:07.9902949Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_uint8 PASSED [0.0503s] [ 77%] 2025-12-04T14:00:07.9903327Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bfloat16 PASSED [0.1075s] [ 77%] 2025-12-04T14:00:07.9903697Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bool PASSED [0.0925s] [ 77%] 2025-12-04T14:00:07.9904090Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex128 PASSED [0.1107s] [ 77%] 2025-12-04T14:00:07.9904489Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex64 PASSED [0.1111s] [ 77%] 2025-12-04T14:00:07.9904864Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float16 PASSED [0.1072s] [ 77%] 2025-12-04T14:00:07.9905300Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float32 PASSED [0.1075s] [ 77%] 2025-12-04T14:00:07.9905677Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float64 PASSED [0.1071s] [ 77%] 2025-12-04T14:00:07.9906120Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int16 PASSED [0.0923s] [ 77%] 2025-12-04T14:00:07.9906496Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int32 PASSED [0.0919s] [ 77%] 2025-12-04T14:00:07.9906864Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int64 PASSED [0.0926s] [ 77%] 2025-12-04T14:00:07.9907229Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int8 PASSED [0.0920s] [ 77%] 2025-12-04T14:00:07.9907600Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_uint8 PASSED [0.0921s] [ 78%] 2025-12-04T14:00:07.9908239Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bfloat16 PASSED [0.1068s] [ 78%] 2025-12-04T14:00:07.9908632Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bool PASSED [0.0917s] [ 78%] 2025-12-04T14:00:07.9909029Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex128 PASSED [0.1101s] [ 78%] 2025-12-04T14:00:07.9909418Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex64 PASSED [0.1107s] [ 78%] 2025-12-04T14:00:07.9909883Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float16 PASSED [0.1062s] [ 78%] 2025-12-04T14:00:07.9910258Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float32 PASSED [0.1065s] [ 78%] 2025-12-04T14:00:07.9910685Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float64 PASSED [0.1066s] [ 78%] 2025-12-04T14:00:07.9911056Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int16 PASSED [0.0912s] [ 78%] 2025-12-04T14:00:07.9911426Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int32 PASSED [0.0917s] [ 78%] 2025-12-04T14:00:07.9911797Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int64 PASSED [0.0919s] [ 78%] 2025-12-04T14:00:07.9912161Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int8 PASSED [0.0916s] [ 78%] 2025-12-04T14:00:07.9912526Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_uint8 PASSED [0.0917s] [ 78%] 2025-12-04T14:00:07.9912910Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bfloat16 PASSED [0.0726s] [ 78%] 2025-12-04T14:00:07.9913275Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bool PASSED [0.0642s] [ 79%] 2025-12-04T14:00:07.9913673Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex128 PASSED [0.0744s] [ 79%] 2025-12-04T14:00:07.9914063Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex64 PASSED [0.0749s] [ 79%] 2025-12-04T14:00:07.9914442Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float16 PASSED [0.0725s] [ 79%] 2025-12-04T14:00:07.9914821Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float32 PASSED [0.0726s] [ 79%] 2025-12-04T14:00:07.9915193Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float64 PASSED [0.0724s] [ 79%] 2025-12-04T14:00:07.9915574Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int16 PASSED [0.0643s] [ 79%] 2025-12-04T14:00:07.9915942Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int32 PASSED [0.0640s] [ 79%] 2025-12-04T14:00:07.9916309Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int64 PASSED [0.0646s] [ 79%] 2025-12-04T14:00:07.9916735Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int8 PASSED [0.0640s] [ 79%] 2025-12-04T14:00:07.9917104Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_uint8 PASSED [0.0640s] [ 79%] 2025-12-04T14:00:07.9917544Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bfloat16 PASSED [0.0718s] [ 79%] 2025-12-04T14:00:07.9917910Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bool PASSED [0.0635s] [ 79%] 2025-12-04T14:00:07.9918305Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex128 PASSED [0.0736s] [ 79%] 2025-12-04T14:00:07.9918751Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex64 PASSED [0.0742s] [ 80%] 2025-12-04T14:00:07.9919128Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float16 PASSED [0.0718s] [ 80%] 2025-12-04T14:00:07.9919511Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float32 PASSED [0.0718s] [ 80%] 2025-12-04T14:00:07.9919890Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float64 PASSED [0.0718s] [ 80%] 2025-12-04T14:00:07.9920256Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int16 PASSED [0.0633s] [ 80%] 2025-12-04T14:00:07.9920627Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int32 PASSED [0.0635s] [ 80%] 2025-12-04T14:00:07.9921033Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int64 PASSED [0.0639s] [ 80%] 2025-12-04T14:00:07.9921403Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int8 PASSED [0.0634s] [ 80%] 2025-12-04T14:00:07.9921809Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_uint8 PASSED [0.0635s] [ 80%] 2025-12-04T14:00:07.9922189Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bfloat16 PASSED [0.0611s] [ 80%] 2025-12-04T14:00:07.9922560Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bool PASSED [0.0526s] [ 80%] 2025-12-04T14:00:07.9922959Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex128 PASSED [0.0629s] [ 80%] 2025-12-04T14:00:07.9923352Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex64 PASSED [0.0635s] [ 80%] 2025-12-04T14:00:07.9923729Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float16 PASSED [0.0611s] [ 80%] 2025-12-04T14:00:07.9924101Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float32 PASSED [0.0611s] [ 81%] 2025-12-04T14:00:07.9924485Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float64 PASSED [0.0612s] [ 81%] 2025-12-04T14:00:07.9924853Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int16 PASSED [0.0527s] [ 81%] 2025-12-04T14:00:07.9925234Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int32 PASSED [0.0526s] [ 81%] 2025-12-04T14:00:07.9925606Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int64 PASSED [0.0532s] [ 81%] 2025-12-04T14:00:07.9925969Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int8 PASSED [0.0526s] [ 81%] 2025-12-04T14:00:07.9926343Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_uint8 PASSED [0.0527s] [ 81%] 2025-12-04T14:00:07.9926725Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bfloat16 PASSED [0.0610s] [ 81%] 2025-12-04T14:00:07.9927095Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bool PASSED [0.0527s] [ 81%] 2025-12-04T14:00:07.9927484Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex128 PASSED [0.0629s] [ 81%] 2025-12-04T14:00:07.9927918Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex64 PASSED [0.0636s] [ 81%] 2025-12-04T14:00:07.9928302Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float16 PASSED [0.0610s] [ 81%] 2025-12-04T14:00:07.9928749Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float32 PASSED [0.0611s] [ 81%] 2025-12-04T14:00:07.9929145Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float64 PASSED [0.0609s] [ 81%] 2025-12-04T14:00:07.9929512Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int16 PASSED [0.0526s] [ 82%] 2025-12-04T14:00:07.9929885Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int32 PASSED [0.0526s] [ 82%] 2025-12-04T14:00:07.9930255Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int64 PASSED [0.0530s] [ 82%] 2025-12-04T14:00:07.9930624Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int8 PASSED [0.0525s] [ 82%] 2025-12-04T14:00:07.9930995Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_uint8 PASSED [0.0525s] [ 82%] 2025-12-04T14:00:07.9931375Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bfloat16 PASSED [0.1002s] [ 82%] 2025-12-04T14:00:07.9931739Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bool PASSED [0.0861s] [ 82%] 2025-12-04T14:00:07.9932182Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex128 PASSED [0.1040s] [ 82%] 2025-12-04T14:00:07.9932571Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex64 PASSED [0.1045s] [ 82%] 2025-12-04T14:00:07.9932984Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float16 PASSED [0.1009s] [ 82%] 2025-12-04T14:00:07.9933368Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float32 PASSED [0.1007s] [ 82%] 2025-12-04T14:00:07.9933744Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float64 PASSED [0.1007s] [ 82%] 2025-12-04T14:00:07.9934116Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int16 PASSED [0.0856s] [ 82%] 2025-12-04T14:00:07.9934480Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int32 PASSED [0.0859s] [ 82%] 2025-12-04T14:00:07.9934849Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int64 PASSED [0.0864s] [ 83%] 2025-12-04T14:00:07.9935217Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int8 PASSED [0.0861s] [ 83%] 2025-12-04T14:00:07.9935586Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_uint8 PASSED [0.0856s] [ 83%] 2025-12-04T14:00:07.9935970Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bfloat16 PASSED [0.1006s] [ 83%] 2025-12-04T14:00:07.9936343Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bool PASSED [0.0859s] [ 83%] 2025-12-04T14:00:07.9936737Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex128 PASSED [0.1039s] [ 83%] 2025-12-04T14:00:07.9937132Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex64 PASSED [0.1047s] [ 83%] 2025-12-04T14:00:07.9937506Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float16 PASSED [0.1007s] [ 83%] 2025-12-04T14:00:07.9937887Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float32 PASSED [0.1002s] [ 83%] 2025-12-04T14:00:07.9938266Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float64 PASSED [0.1004s] [ 83%] 2025-12-04T14:00:07.9938658Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int16 PASSED [0.0859s] [ 83%] 2025-12-04T14:00:07.9939160Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int32 PASSED [0.0860s] [ 83%] 2025-12-04T14:00:07.9939527Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int64 PASSED [0.0866s] [ 83%] 2025-12-04T14:00:07.9939952Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int8 PASSED [0.0859s] [ 83%] 2025-12-04T14:00:07.9940321Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_uint8 PASSED [0.0858s] [ 84%] 2025-12-04T14:00:07.9940702Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bfloat16 PASSED [0.1074s] [ 84%] 2025-12-04T14:00:07.9941076Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bool PASSED [0.0922s] [ 84%] 2025-12-04T14:00:07.9941468Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex128 PASSED [0.1106s] [ 84%] 2025-12-04T14:00:07.9941866Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex64 PASSED [0.1110s] [ 84%] 2025-12-04T14:00:07.9942242Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float16 PASSED [0.1072s] [ 84%] 2025-12-04T14:00:07.9942617Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float32 PASSED [0.1068s] [ 84%] 2025-12-04T14:00:07.9942999Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float64 PASSED [0.1073s] [ 84%] 2025-12-04T14:00:07.9943408Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int16 PASSED [0.0919s] [ 84%] 2025-12-04T14:00:07.9943784Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int32 PASSED [0.0921s] [ 84%] 2025-12-04T14:00:07.9944190Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int64 PASSED [0.0928s] [ 84%] 2025-12-04T14:00:07.9944553Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int8 PASSED [0.0920s] [ 84%] 2025-12-04T14:00:07.9944930Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_uint8 PASSED [0.0922s] [ 84%] 2025-12-04T14:00:07.9945308Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bfloat16 PASSED [0.1066s] [ 84%] 2025-12-04T14:00:07.9945686Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bool PASSED [0.0917s] [ 85%] 2025-12-04T14:00:07.9946080Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex128 PASSED [0.1098s] [ 85%] 2025-12-04T14:00:07.9946466Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex64 PASSED [0.1104s] [ 85%] 2025-12-04T14:00:07.9946852Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float16 PASSED [0.1063s] [ 85%] 2025-12-04T14:00:07.9947225Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float32 PASSED [0.1064s] [ 85%] 2025-12-04T14:00:07.9947604Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float64 PASSED [0.1063s] [ 85%] 2025-12-04T14:00:07.9947972Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int16 PASSED [0.0915s] [ 85%] 2025-12-04T14:00:07.9948338Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int32 PASSED [0.0917s] [ 85%] 2025-12-04T14:00:07.9948741Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int64 PASSED [0.0919s] [ 85%] 2025-12-04T14:00:07.9949125Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int8 PASSED [0.0916s] [ 85%] 2025-12-04T14:00:07.9949498Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_uint8 PASSED [0.0917s] [ 85%] 2025-12-04T14:00:07.9949877Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bfloat16 PASSED [0.0658s] [ 85%] 2025-12-04T14:00:07.9950312Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bool PASSED [0.0503s] [ 85%] 2025-12-04T14:00:07.9950707Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex128 PASSED [0.0691s] [ 85%] 2025-12-04T14:00:07.9951210Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex64 PASSED [0.0696s] [ 86%] 2025-12-04T14:00:07.9951592Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float16 PASSED [0.0657s] [ 86%] 2025-12-04T14:00:07.9951968Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float32 PASSED [0.0657s] [ 86%] 2025-12-04T14:00:07.9952342Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float64 PASSED [0.0655s] [ 86%] 2025-12-04T14:00:07.9952715Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int16 PASSED [0.0504s] [ 86%] 2025-12-04T14:00:07.9953079Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int32 PASSED [0.0503s] [ 86%] 2025-12-04T14:00:07.9953452Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int64 PASSED [0.0509s] [ 86%] 2025-12-04T14:00:07.9953818Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int8 PASSED [0.0503s] [ 86%] 2025-12-04T14:00:07.9954186Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_uint8 PASSED [0.0504s] [ 86%] 2025-12-04T14:00:07.9954616Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bfloat16 PASSED [0.0656s] [ 86%] 2025-12-04T14:00:07.9954986Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bool PASSED [0.0503s] [ 86%] 2025-12-04T14:00:07.9955422Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex128 PASSED [0.0691s] [ 86%] 2025-12-04T14:00:07.9955818Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex64 PASSED [0.0696s] [ 86%] 2025-12-04T14:00:07.9956196Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float16 PASSED [0.0656s] [ 86%] 2025-12-04T14:00:07.9956580Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float32 PASSED [0.0656s] [ 87%] 2025-12-04T14:00:07.9956954Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float64 PASSED [0.0654s] [ 87%] 2025-12-04T14:00:07.9957322Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int16 PASSED [0.0503s] [ 87%] 2025-12-04T14:00:07.9957693Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int32 PASSED [0.0502s] [ 87%] 2025-12-04T14:00:07.9958064Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int64 PASSED [0.0506s] [ 87%] 2025-12-04T14:00:07.9958433Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int8 PASSED [0.0502s] [ 87%] 2025-12-04T14:00:07.9958832Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_uint8 PASSED [0.0501s] [ 87%] 2025-12-04T14:00:07.9959230Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bfloat16 PASSED [0.0483s] [ 87%] 2025-12-04T14:00:07.9959586Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bool PASSED [0.0414s] [ 87%] 2025-12-04T14:00:07.9959967Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex128 PASSED [0.0499s] [ 87%] 2025-12-04T14:00:07.9960347Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex64 PASSED [0.0503s] [ 87%] 2025-12-04T14:00:07.9960711Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float16 PASSED [0.0484s] [ 87%] 2025-12-04T14:00:07.9961073Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float32 PASSED [0.0482s] [ 87%] 2025-12-04T14:00:07.9961484Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float64 PASSED [0.0482s] [ 88%] 2025-12-04T14:00:07.9961839Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int16 PASSED [0.0413s] [ 88%] 2025-12-04T14:00:07.9962237Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int32 PASSED [0.0413s] [ 88%] 2025-12-04T14:00:07.9962595Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int64 PASSED [0.0417s] [ 88%] 2025-12-04T14:00:07.9962952Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int8 PASSED [0.0412s] [ 88%] 2025-12-04T14:00:07.9963319Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_uint8 PASSED [0.0412s] [ 88%] 2025-12-04T14:00:07.9963687Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bfloat16 PASSED [0.0479s] [ 88%] 2025-12-04T14:00:07.9964048Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bool PASSED [0.0413s] [ 88%] 2025-12-04T14:00:07.9964432Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex128 PASSED [0.0496s] [ 88%] 2025-12-04T14:00:07.9964806Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex64 PASSED [0.0506s] [ 88%] 2025-12-04T14:00:07.9965175Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float16 PASSED [0.0483s] [ 88%] 2025-12-04T14:00:07.9965588Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float32 PASSED [0.0482s] [ 88%] 2025-12-04T14:00:07.9965962Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float64 PASSED [0.0481s] [ 88%] 2025-12-04T14:00:07.9966360Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int16 PASSED [0.0413s] [ 88%] 2025-12-04T14:00:07.9966715Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int32 PASSED [0.0413s] [ 89%] 2025-12-04T14:00:07.9967084Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int64 PASSED [0.0416s] [ 89%] 2025-12-04T14:00:07.9967439Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int8 PASSED [0.0413s] [ 89%] 2025-12-04T14:00:07.9967802Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_uint8 PASSED [0.0412s] [ 89%] 2025-12-04T14:00:07.9968169Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bfloat16 PASSED [0.0464s] [ 89%] 2025-12-04T14:00:07.9968525Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bool PASSED [0.0396s] [ 89%] 2025-12-04T14:00:07.9968912Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex128 PASSED [0.0480s] [ 89%] 2025-12-04T14:00:07.9969286Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex64 PASSED [0.0485s] [ 89%] 2025-12-04T14:00:07.9969656Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float16 PASSED [0.0465s] [ 89%] 2025-12-04T14:00:07.9970022Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float32 PASSED [0.0464s] [ 89%] 2025-12-04T14:00:07.9970389Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float64 PASSED [0.0464s] [ 89%] 2025-12-04T14:00:07.9970750Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int16 PASSED [0.0396s] [ 89%] 2025-12-04T14:00:07.9971109Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int32 PASSED [0.0395s] [ 89%] 2025-12-04T14:00:07.9971472Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int64 PASSED [0.0399s] [ 89%] 2025-12-04T14:00:07.9971826Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int8 PASSED [0.0395s] [ 90%] 2025-12-04T14:00:07.9972179Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_uint8 PASSED [0.0395s] [ 90%] 2025-12-04T14:00:07.9972598Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bfloat16 PASSED [0.0464s] [ 90%] 2025-12-04T14:00:07.9972991Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bool PASSED [0.0395s] [ 90%] 2025-12-04T14:00:07.9973370Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex128 PASSED [0.0481s] [ 90%] 2025-12-04T14:00:07.9973752Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex64 PASSED [0.0485s] [ 90%] 2025-12-04T14:00:07.9974115Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float16 PASSED [0.0465s] [ 90%] 2025-12-04T14:00:07.9974485Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float32 PASSED [0.0463s] [ 90%] 2025-12-04T14:00:07.9974847Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float64 PASSED [0.0462s] [ 90%] 2025-12-04T14:00:07.9975203Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int16 PASSED [0.0395s] [ 90%] 2025-12-04T14:00:07.9975563Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int32 PASSED [0.0395s] [ 90%] 2025-12-04T14:00:07.9975917Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int64 PASSED [0.0399s] [ 90%] 2025-12-04T14:00:07.9976273Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int8 PASSED [0.0395s] [ 90%] 2025-12-04T14:00:07.9976670Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_uint8 PASSED [0.0395s] [ 90%] 2025-12-04T14:00:07.9977078Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bfloat16 PASSED [0.0936s] [ 91%] 2025-12-04T14:00:07.9977438Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bool PASSED [0.0788s] [ 91%] 2025-12-04T14:00:07.9977817Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex128 PASSED [0.0969s] [ 91%] 2025-12-04T14:00:07.9978197Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex64 PASSED [0.0973s] [ 91%] 2025-12-04T14:00:07.9978567Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float16 PASSED [0.0933s] [ 91%] 2025-12-04T14:00:07.9978980Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float32 PASSED [0.0933s] [ 91%] 2025-12-04T14:00:07.9979404Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float64 PASSED [0.0933s] [ 91%] 2025-12-04T14:00:07.9979762Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int16 PASSED [0.0788s] [ 91%] 2025-12-04T14:00:07.9980122Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int32 PASSED [0.0786s] [ 91%] 2025-12-04T14:00:07.9980479Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int64 PASSED [0.0791s] [ 91%] 2025-12-04T14:00:07.9980831Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int8 PASSED [0.0789s] [ 91%] 2025-12-04T14:00:07.9981193Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_uint8 PASSED [0.0786s] [ 91%] 2025-12-04T14:00:07.9981561Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bfloat16 PASSED [0.0935s] [ 91%] 2025-12-04T14:00:07.9981919Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bool PASSED [0.0787s] [ 91%] 2025-12-04T14:00:07.9982299Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex128 PASSED [0.0966s] [ 92%] 2025-12-04T14:00:07.9982675Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex64 PASSED [0.0970s] [ 92%] 2025-12-04T14:00:07.9983048Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float16 PASSED [0.0933s] [ 92%] 2025-12-04T14:00:07.9983460Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float32 PASSED [0.0932s] [ 92%] 2025-12-04T14:00:07.9983869Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float64 PASSED [0.0933s] [ 92%] 2025-12-04T14:00:07.9984229Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int16 PASSED [0.0787s] [ 92%] 2025-12-04T14:00:07.9984585Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int32 PASSED [0.0786s] [ 92%] 2025-12-04T14:00:07.9984948Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int64 PASSED [0.0791s] [ 92%] 2025-12-04T14:00:07.9985304Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int8 PASSED [0.0786s] [ 92%] 2025-12-04T14:00:07.9985674Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_uint8 PASSED [0.0788s] [ 92%] 2025-12-04T14:00:07.9986043Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bfloat16 PASSED [0.0468s] [ 92%] 2025-12-04T14:00:07.9986400Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bool PASSED [0.0396s] [ 92%] 2025-12-04T14:00:07.9986785Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex128 PASSED [0.0478s] [ 92%] 2025-12-04T14:00:07.9987157Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex64 PASSED [0.0483s] [ 92%] 2025-12-04T14:00:07.9991112Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float16 PASSED [0.0463s] [ 93%] 2025-12-04T14:00:07.9991509Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float32 PASSED [0.0462s] [ 93%] 2025-12-04T14:00:07.9991945Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float64 PASSED [0.0462s] [ 93%] 2025-12-04T14:00:07.9992304Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int16 PASSED [0.0393s] [ 93%] 2025-12-04T14:00:07.9992660Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int32 PASSED [0.0393s] [ 93%] 2025-12-04T14:00:07.9993017Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int64 PASSED [0.0397s] [ 93%] 2025-12-04T14:00:07.9993374Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int8 PASSED [0.0394s] [ 93%] 2025-12-04T14:00:07.9993727Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_uint8 PASSED [0.0393s] [ 93%] 2025-12-04T14:00:07.9994098Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bfloat16 PASSED [0.0461s] [ 93%] 2025-12-04T14:00:07.9994452Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bool PASSED [0.0393s] [ 93%] 2025-12-04T14:00:07.9994829Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex128 PASSED [0.0477s] [ 93%] 2025-12-04T14:00:07.9995206Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex64 PASSED [0.0485s] [ 93%] 2025-12-04T14:00:07.9995569Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float16 PASSED [0.0463s] [ 93%] 2025-12-04T14:00:07.9995933Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float32 PASSED [0.0462s] [ 93%] 2025-12-04T14:00:07.9996294Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float64 PASSED [0.0463s] [ 94%] 2025-12-04T14:00:07.9996648Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int16 PASSED [0.0394s] [ 94%] 2025-12-04T14:00:07.9997007Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int32 PASSED [0.0394s] [ 94%] 2025-12-04T14:00:07.9997359Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int64 PASSED [0.0398s] [ 94%] 2025-12-04T14:00:07.9997711Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int8 PASSED [0.0393s] [ 94%] 2025-12-04T14:00:07.9998113Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_uint8 PASSED [0.0394s] [ 94%] 2025-12-04T14:00:07.9998550Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bfloat16 PASSED [0.0445s] [ 94%] 2025-12-04T14:00:07.9998933Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bool PASSED [0.0377s] [ 94%] 2025-12-04T14:00:07.9999313Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex128 PASSED [0.0461s] [ 94%] 2025-12-04T14:00:07.9999688Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex64 PASSED [0.0467s] [ 94%] 2025-12-04T14:00:08.0000052Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float16 PASSED [0.0446s] [ 94%] 2025-12-04T14:00:08.0000413Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float32 PASSED [0.0444s] [ 94%] 2025-12-04T14:00:08.0000781Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float64 PASSED [0.0446s] [ 94%] 2025-12-04T14:00:08.0001136Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int16 PASSED [0.0377s] [ 94%] 2025-12-04T14:00:08.0001494Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int32 PASSED [0.0376s] [ 95%] 2025-12-04T14:00:08.0001848Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int64 PASSED [0.0380s] [ 95%] 2025-12-04T14:00:08.0002242Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int8 PASSED [0.0376s] [ 95%] 2025-12-04T14:00:08.0002639Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_uint8 PASSED [0.0377s] [ 95%] 2025-12-04T14:00:08.0003005Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bfloat16 PASSED [0.0446s] [ 95%] 2025-12-04T14:00:08.0003365Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bool PASSED [0.0377s] [ 95%] 2025-12-04T14:00:08.0003743Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex128 PASSED [0.0460s] [ 95%] 2025-12-04T14:00:08.0004116Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex64 PASSED [0.0467s] [ 95%] 2025-12-04T14:00:08.0004480Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float16 PASSED [0.0446s] [ 95%] 2025-12-04T14:00:08.0004842Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float32 PASSED [0.0445s] [ 95%] 2025-12-04T14:00:08.0005208Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float64 PASSED [0.0446s] [ 95%] 2025-12-04T14:00:08.0005562Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int16 PASSED [0.0377s] [ 95%] 2025-12-04T14:00:08.0005915Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int32 PASSED [0.0376s] [ 95%] 2025-12-04T14:00:08.0006273Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int64 PASSED [0.0381s] [ 95%] 2025-12-04T14:00:08.0006625Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int8 PASSED [0.0376s] [ 96%] 2025-12-04T14:00:08.0006979Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_uint8 PASSED [0.0377s] [ 96%] 2025-12-04T14:00:08.0007414Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSC_cuda_float64 SKIPPED [0.0014s] (Only runs on cpu) [ 96%] 2025-12-04T14:00:08.0008061Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSR_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 96%] 2025-12-04T14:00:08.0008513Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCOO_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 96%] 2025-12-04T14:00:08.0008972Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSC_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 96%] 2025-12-04T14:00:08.0009479Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSR_cuda_float64 SKIPPED [0.0018s] (Only runs on cpu) [ 96%] 2025-12-04T14:00:08.0009957Z test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_Strided_cuda_float64 SKIPPED [0.0012s] (Only runs on cpu) [ 96%] 2025-12-04T14:00:08.0010399Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSC_cuda PASSED [0.0017s] [ 96%] 2025-12-04T14:00:08.0010844Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSR_cuda PASSED [0.0020s] [ 96%] 2025-12-04T14:00:08.0011281Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCOO_cuda PASSED [0.0017s] [ 96%] 2025-12-04T14:00:08.0011719Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSC_cuda PASSED [0.0016s] [ 96%] 2025-12-04T14:00:08.0012157Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSR_cuda PASSED [0.0017s] [ 96%] 2025-12-04T14:00:08.0012588Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_Strided_cuda PASSED [0.0019s] [ 96%] 2025-12-04T14:00:08.0013017Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSC_cuda PASSED [0.0019s] [ 97%] 2025-12-04T14:00:08.0013442Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSR_cuda PASSED [0.0017s] [ 97%] 2025-12-04T14:00:08.0013927Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCOO_cuda PASSED [0.0016s] [ 97%] 2025-12-04T14:00:08.0014404Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSC_cuda PASSED [0.0017s] [ 97%] 2025-12-04T14:00:08.0014828Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSR_cuda PASSED [0.0016s] [ 97%] 2025-12-04T14:00:08.0015246Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_Strided_cuda PASSED [0.0019s] [ 97%] 2025-12-04T14:00:08.0015682Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSC_cuda PASSED [0.0020s] [ 97%] 2025-12-04T14:00:08.0016117Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSR_cuda PASSED [0.0016s] [ 97%] 2025-12-04T14:00:08.0016550Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCOO_cuda PASSED [0.0017s] [ 97%] 2025-12-04T14:00:08.0016982Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSC_cuda PASSED [0.0017s] [ 97%] 2025-12-04T14:00:08.0017419Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSR_cuda PASSED [0.0016s] [ 97%] 2025-12-04T14:00:08.0017843Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_Strided_cuda PASSED [0.0014s] [ 97%] 2025-12-04T14:00:08.0018284Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSC_cuda PASSED [0.0023s] [ 97%] 2025-12-04T14:00:08.0018725Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSR_cuda PASSED [0.0016s] [ 97%] 2025-12-04T14:00:08.0019209Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCOO_cuda PASSED [0.0017s] [ 98%] 2025-12-04T14:00:08.0019652Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSC_cuda PASSED [0.0017s] [ 98%] 2025-12-04T14:00:08.0020090Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSR_cuda PASSED [0.0016s] [ 98%] 2025-12-04T14:00:08.0020520Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_Strided_cuda PASSED [0.0015s] [ 98%] 2025-12-04T14:00:08.0020989Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSC_cuda PASSED [0.0023s] [ 98%] 2025-12-04T14:00:08.0021451Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSR_cuda PASSED [0.0017s] [ 98%] 2025-12-04T14:00:08.0021874Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCOO_cuda PASSED [0.0016s] [ 98%] 2025-12-04T14:00:08.0022292Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSC_cuda PASSED [0.0017s] [ 98%] 2025-12-04T14:00:08.0022719Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSR_cuda PASSED [0.0016s] [ 98%] 2025-12-04T14:00:08.0023129Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_Strided_cuda PASSED [0.0014s] [ 98%] 2025-12-04T14:00:08.0023567Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSC_cuda PASSED [0.0023s] [ 98%] 2025-12-04T14:00:08.0024010Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSR_cuda PASSED [0.0016s] [ 98%] 2025-12-04T14:00:08.0024447Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCOO_cuda PASSED [0.0016s] [ 98%] 2025-12-04T14:00:08.0024885Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSC_cuda PASSED [0.0016s] [ 98%] 2025-12-04T14:00:08.0025407Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSR_cuda PASSED [0.0017s] [ 99%] 2025-12-04T14:00:08.0025838Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_Strided_cuda PASSED [0.0014s] [ 99%] 2025-12-04T14:00:08.0026314Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSC_cuda PASSED [0.0016s] [ 99%] 2025-12-04T14:00:08.0026746Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSR_cuda PASSED [0.0026s] [ 99%] 2025-12-04T14:00:08.0027188Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCOO_cuda PASSED [0.0017s] [ 99%] 2025-12-04T14:00:08.0027622Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSC_cuda PASSED [0.0016s] [ 99%] 2025-12-04T14:00:08.0028053Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSR_cuda PASSED [0.0016s] [ 99%] 2025-12-04T14:00:08.0028506Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_Strided_cuda PASSED [0.0015s] [ 99%] 2025-12-04T14:00:08.0028947Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSC_cuda PASSED [0.0016s] [ 99%] 2025-12-04T14:00:08.0029367Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSR_cuda PASSED [0.0020s] [ 99%] 2025-12-04T14:00:08.0029780Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCOO_cuda PASSED [0.0016s] [ 99%] 2025-12-04T14:00:08.0030198Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSC_cuda PASSED [0.0016s] [ 99%] 2025-12-04T14:00:08.0030614Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSR_cuda PASSED [0.0016s] [ 99%] 2025-12-04T14:00:08.0031016Z test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_Strided_cuda PASSED [0.0016s] [100%] 2025-12-04T14:00:08.0031022Z 2025-12-04T14:00:08.0031525Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.xml - 2025-12-04T14:00:08.0031729Z ======== 1199 passed, 193 skipped, 1708 deselected in 651.49s (0:10:51) ======== 2025-12-04T14:00:08.0032441Z The following tests failed consistently: ['test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64', 'test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64'] 2025-12-04T14:00:08.0032500Z 2025-12-04T14:00:08.0032832Z FINISHED PRINTING LOG FILE of test_sparse 1/1 (test/test-reports/test_sparse_1.1_e217f60a40d48402_.log) 2025-12-04T14:00:08.0032837Z 2025-12-04T14:00:08.0033104Z Finished test_sparse 1/1 ... [2025-12-04 14:00:07.585002][17279.594224164], took 15.45min 2025-12-04T14:00:08.0033646Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.xml 2025-12-04T14:00:08.0034179Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.xml 2025-12-04T14:00:08.0034708Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.xml 2025-12-04T14:00:08.0035233Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.xml 2025-12-04T14:00:08.0035784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.xml 2025-12-04T14:00:08.0036314Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.xml 2025-12-04T14:00:08.0036841Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.xml 2025-12-04T14:00:08.4830711Z Uploading logs for 57118183212 to S3 2025-12-04T14:00:08.6549021Z Uploading artifacts took 0.72 seconds 2025-12-04T14:00:08.6549334Z test_sparse 1/1 failed! 2025-12-04T14:00:08.6552669Z Running test_ci_sanity_check_fail 1/1 ... [2025-12-04 14:00:08.654967][17280.664189554] 2025-12-04T14:00:08.6553140Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:00:08.6557183Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ci_sanity_check_fail.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:00:08.655374] 2025-12-04T14:00:23.4921403Z Finished test_ci_sanity_check_fail 1/1 ... [2025-12-04 14:00:23.491617][17295.500837836], took 0.25min 2025-12-04T14:00:23.5094510Z Running test_ops_fwd_gradients 6/12 ... [2025-12-04 14:00:23.508944][17295.518170471] 2025-12-04T14:00:23.5095143Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:00:23.5096307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_fwd_gradients.py', '--shard-id=6', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:00:23.509263] 2025-12-04T14:12:56.1077563Z 2025-12-04T14:12:56.1078719Z test_ops_fwd_gradients 6/12 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_fwd_gradients_6.12_abead446b517b77f_.log 2025-12-04T14:12:56.1254378Z Running 276 items in this shard: test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_T_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rdiv___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_baddbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cartesian_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_chalf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagflat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_div_floor_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_einsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fliplr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_floor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gradient_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_heaviside_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isreal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_or_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_grid_sample_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_kl_div_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softsign_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_threshold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ormqr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_decimals_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsub_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scalar_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_searchsorted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sparse_mm_reduce_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sum_to_size_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tensor_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_where_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__chunk_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_baddbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_char_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clamp_max_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_column_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_constant_pad_nd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_contiguous_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_div_floor_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_div_trunc_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ihfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_frexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_geqrf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_unary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svdvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorsolve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logaddexp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_long_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_area_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_inf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_nuc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_number_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_4_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_xlog1py_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_with_sizes_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_to_size_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_to_size_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsqueeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addcdiv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_all_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_allclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_any_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_arange_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_partial_views_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_constant_pad_nd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_count_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cummin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_equal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_erf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ihfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_float_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_grid_sampler_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_istft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_binary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mT_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_list_of_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_variadic_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_circular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softplus_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_normal_in_place_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pinverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_real_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reciprocal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signbit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_spherical_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_square_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_square_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_to_size_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensor_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unbind_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unflatten_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vsplit_cuda_float64 2025-12-04T14:12:56.1427600Z 2025-12-04T14:12:56.1427912Z Finished test_ops_fwd_gradients 6/12 ... [2025-12-04 14:12:56.107871][18048.117094123], took 12.54min 2025-12-04T14:12:56.1428976Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-95ccd07868721469.xml 2025-12-04T14:12:56.2244715Z Running test_ops_gradients 2/10 ... [2025-12-04 14:12:56.224071][18048.233293292] 2025-12-04T14:12:56.2245355Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:12:56.2248317Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_gradients.py', '--shard-id=2', '--num-shards=10', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:12:56.224436] 2025-12-04T14:25:25.8651557Z 2025-12-04T14:25:25.8653247Z test_ops_gradients 2/10 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_2.10_8b90327e47e16b38_.log 2025-12-04T14:25:25.8873408Z Running 520 items in this shard: test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyMulScalarCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyTakeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__chunk_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__softmax_backward_data_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atleast_1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_copysign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cummax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_permuted_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ihfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_float_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorsolve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_xor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_median_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_searchsorted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unbind_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unflatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unfold_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyCubeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__segment_reduce_lengths_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cdouble_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_div_no_rounding_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fmod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_geqrf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_householder_product_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_pinv_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vecdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_long_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_map_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_multinomial_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_kl_div_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_circular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_circular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_real_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_short_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_hann_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_H_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyMulScalarCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_argwhere_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_max_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diag_embed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_double_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_einsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_inner_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isnan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_vecdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_long_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_narrow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softsign_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_fro_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_interleave_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_slice_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_split_with_sizes_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_transpose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unbind_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_argwhere_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_block_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_count_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagflat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ge_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_householder_product_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_movedim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_embedding_bag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pinverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ravel_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rot90_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_svd_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_T_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_argsort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagflat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ihfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_triangular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logaddexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_or_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_hardsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rand_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rsqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_erfcx_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_square_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_svd_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tensordot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tril_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_triu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unbind_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vsplit_cuda_complex128 2025-12-04T14:25:25.9085989Z 2025-12-04T14:25:25.9086281Z Finished test_ops_gradients 2/10 ... [2025-12-04 14:25:25.866059][18797.875283273], took 12.49min 2025-12-04T14:25:25.9087358Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-1e96fc6cc9093b07.xml 2025-12-04T14:25:26.5873869Z Uploading artifacts took 0.62 seconds 2025-12-04T14:25:26.5877040Z Running test_ops_gradients 10/10 ... [2025-12-04 14:25:26.587365][18798.596588143] 2025-12-04T14:25:26.5877648Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:25:26.5881704Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_gradients.py', '--shard-id=10', '--num-shards=10', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:25:26.587769] 2025-12-04T14:39:29.0135786Z 2025-12-04T14:39:29.0137372Z test_ops_gradients 10/10 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_10.10_690d4f6748dd1bf7_.log 2025-12-04T14:39:29.0384416Z Running 574 items in this shard: test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_argwhere_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cdouble_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_digamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dist_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_eye_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hash_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kron_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_householder_product_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vander_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logaddexp2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logaddexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_mish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_prelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_inf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_nuc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_real_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_zeta_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyWithIntCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___radd___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cov_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_gradient_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log1p_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_xor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_inf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_general_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_std_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triangular_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tril_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unbind_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_vsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_T_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atleast_1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cholesky_inverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_column_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_constant_pad_nd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_rank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_singular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_maximum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_min_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_constant_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_normal_in_place_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_renorm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_round_decimals_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scan_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tile_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zero__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_as_strided_partial_views_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cartesian_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_corrcoef_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_grid_sampler_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorsolve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_vander_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mT_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_max_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_mse_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_relu6_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rsub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_short_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_log_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unfold_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_consecutive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsafe_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_where_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_zero__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_digamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lgamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_or_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_median_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_movedim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_dropout3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_squeeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vdot_cuda_complex128 2025-12-04T14:39:29.0620055Z 2025-12-04T14:39:29.0620350Z Finished test_ops_gradients 10/10 ... [2025-12-04 14:39:29.015053][19641.024277061], took 14.04min 2025-12-04T14:39:29.0621431Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-91f289dc18834c3e.xml 2025-12-04T14:39:29.1227230Z Running functorch/test_ops 3/6 ... [2025-12-04 14:39:29.122334][19641.13155584] 2025-12-04T14:39:29.1227771Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:39:29.1230253Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=3', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:39:29.122661] 2025-12-04T14:52:58.8503328Z 2025-12-04T14:52:58.8504350Z functorch/test_ops 3/6 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_3.6_4e22832cb04fe87a_.log 2025-12-04T14:52:58.9188635Z Running 1655 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_data_write_errors_under_transform_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amax_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_sort_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_contiguous_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_flatten_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_unbind_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_vsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mT_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_permute_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpySortAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmatmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_acos_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_allclose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_partial_views_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bernoulli_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cauchy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_combinations_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_complex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_corrcoef_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumsum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expm1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_det_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_grad_oriented_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_hermitian_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logaddexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_ones_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_celu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_huber_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_local_response_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rrelu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_smooth_l1_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_tanhshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_neg_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sgn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_u_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_chunk_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32 2025-12-04T14:52:58.9850113Z 2025-12-04T14:52:58.9850410Z Finished functorch/test_ops 3/6 ... [2025-12-04 14:52:58.852897][20450.862119719], took 13.50min 2025-12-04T14:52:58.9851437Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-05b5b699aba88456.xml 2025-12-04T14:52:59.5613333Z Uploading artifacts took 0.58 seconds 2025-12-04T14:52:59.5617211Z Running dynamo/test_after_aot 1/1 ... [2025-12-04 14:52:59.561438][20451.570660896] 2025-12-04T14:52:59.5617676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:52:59.5621977Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_after_aot.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:52:59.561849] 2025-12-04T14:53:08.3936257Z 2025-12-04T14:53:08.3937128Z dynamo/test_after_aot 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_after_aot_1.1_e8843ead62c525f1_.log 2025-12-04T14:53:08.3938571Z Running 2 items in this shard: test/dynamo/test_after_aot.py::TestAfterAot::test_dump_tensor, test/dynamo/test_after_aot.py::TestAfterAot::test_save_graph_repro 2025-12-04T14:53:08.3939462Z 2025-12-04T14:53:08.3939744Z Finished dynamo/test_after_aot 1/1 ... [2025-12-04 14:53:08.393221][20460.402445603], took 0.15min 2025-12-04T14:53:08.4115436Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_after_aot/dynamo.test_after_aot-138e4478191117d7.xml 2025-12-04T14:53:08.4879423Z Running inductor/test_snode_runtime 1/1 ... [2025-12-04 14:53:08.487554][20460.496776868] 2025-12-04T14:53:08.4879993Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:53:08.4882416Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_snode_runtime.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:53:08.487869] 2025-12-04T14:53:24.5332558Z 2025-12-04T14:53:24.5334026Z inductor/test_snode_runtime 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_snode_runtime_1.1_f8102af9af532885_.log 2025-12-04T14:53:24.5347739Z Running 22 items in this shard: test/inductor/test_snode_runtime.py::UnsupportedTests::test_no_cuda, test/inductor/test_snode_runtime.py::UnsupportedTests::test_no_op, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_addmm, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_bmm, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_conv1d, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_conv2d, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_conv2d_transpose, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_conv3d, test/inductor/test_snode_runtime.py::ComputeBoundedTests::test_mm, test/inductor/test_snode_runtime.py::MemoryBoundedTests::test_dynamic, test/inductor/test_snode_runtime.py::MemoryBoundedTests::test_horizontal_reduction_pointwise, test/inductor/test_snode_runtime.py::MemoryBoundedTests::test_pointwise, test/inductor/test_snode_runtime.py::MemoryBoundedTests::test_relu, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_all_gather_into_tensor, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_all_gather_into_tensor_coalesced, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_all_reduce, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_all_reduce_coalesced, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_legacy_all_gather_into_tensor_coalesced, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_legacy_all_reduce, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_legacy_all_reduce_coalesced, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_reduce_scatter_tensor, test/inductor/test_snode_runtime.py::TestCommAnalysis::test_reduce_scatter_tensor_coalesced 2025-12-04T14:53:24.5358709Z 2025-12-04T14:53:24.5359185Z Finished inductor/test_snode_runtime 1/1 ... [2025-12-04 14:53:24.532906][20476.542130993], took 0.27min 2025-12-04T14:53:24.5523692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_snode_runtime/inductor.test_snode_runtime-f1ec066e866be26d.xml 2025-12-04T14:53:24.6188163Z Running inductor/test_compiled_autograd 1/2 ... [2025-12-04 14:53:24.618376][20476.627598109] 2025-12-04T14:53:24.6188994Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:53:24.6190920Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_autograd.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:53:24.618718] 2025-12-04T15:01:30.9757799Z 2025-12-04T15:01:30.9758863Z inductor/test_compiled_autograd 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_autograd_1.2_d8737cb5eeb8c364_.log 2025-12-04T15:01:30.9995981Z Running 438 items in this shard: test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_5_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_3_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_3_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_already_nan, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_backward, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_grad, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_basic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_data_dependent_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_id_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_non_traceable, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_dynamic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_float_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_int_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_int_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_backward_hook_relative_ordering_partial, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cache_hit, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_sac, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_optimize_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compiled_autograd_does_not_specialize_on_bw_symints, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cpu_offloading, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_graph, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_cpp_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_python_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_sdpa, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_bw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_compiled_fw_bw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_dynamically_defined_class, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_multiple_grads, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_attr, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_multiple_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_ddp_cpp_reducer_error, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_ddp_python_reducer, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_disk_offloading, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_annotations, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_eager_node, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamo_boxed, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_flex_attention, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_free_activation_memory_subclass, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_higher_order_gradients, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_hipify_not_loaded_with_import_cpp_extension, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_hipify_not_loaded_with_import_torch, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inplace_grad_update, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inputs_aliasing_bytecode_stack_restore, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_issue106555, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_keep_graph_usage_after_compiled, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_logging_tensor_flaky, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_output_nodes_all_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_pre_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_tensor_pre_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reset, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_saved_tensor_unpack_hook_ordering, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile_only_backward_call, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_function_mode, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_run_with_rng_state, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_dispatcher_nodes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_dispatcher_nodes_hop, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_cpp, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_snapshot, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_access_saved_tensor_twice_without_recomputation_works, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_can_observe_tensor_prehook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_should_not_execute, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_with_zero_numel_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_assign_parent_cleanup, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_detect_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_mode_no_check_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_view_of_view, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_views_creation_meta, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_views_cross_dtype, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_multiple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_simple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_views_codegen, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_copy, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_create_graph_warns, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_hook_relative_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_retained_graph_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_with_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_calculate_shape_util, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_adds_callback, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_cant_create_saved_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_detects_non_determinism, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_graph_execution_group, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_valid_reset_on_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_correct_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_custom_function_works, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_dataparallel, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_memory_savings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_create_graph_and_full_backward_hook_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_current_graph_task_execution_order, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_ac_early_stop, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_no_early_free, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_repeated_grad_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_tensor_before_tensor_args, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_wrong_formula, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_mark_dirty_not_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_preserve_torch_function_when_return_as_is, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_saved_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_saving_mutated_view_no_leak, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_setup_context_simple, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_vmap_defaults, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_deep_reentrant, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dep_nograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dependent_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_base, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_then_inplace_raises_in_autograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks_nested, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_duplicate_backward_root, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_enable_grad_decorator_no_paren, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_first_grad_fn_access_in_no_grad_mode, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph_complicated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph_pyfunction, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_get_data_and_hooks_from_raw_saved_variable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_empty_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_input_metadata, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf_register_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_thread_safety, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_materialize, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_unreachable_discovery, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_forward_or_backward_only, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_complex_non_complex_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_custom_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_dense_and_sparse_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_respects_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_runs_with_no_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout2, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout4, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_output_shape_or_dtype_depend_on_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_test_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_validates_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_graph_save_on_cpu, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_edge_case_when_called_with_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_none, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hooks_cpp, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_indexing, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_not_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_leaf_errors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_weak_grad_fn, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_integer_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_legacy_function_deprecation_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_lobpcg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_mark_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_materialize_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_backward_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_named_tensor_for_complex_views, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_anomaly_access, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_autograd_function_stashing_ctx, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_nested_anomaly_printstack_cleanup, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_next_functions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_python_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_requires_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_unnecessary_save, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_not_implemented_fwad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pickle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_gets_cleaned_up, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_returns_not_None, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pow_zero_tensor_gradient, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_power_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_prehook_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_aggregation_table, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_function_event_avg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_seq_nr, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_shapes, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_record_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_child_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_callbacks_depth_0, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_leaf_variable_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_requires_grad_, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retains_grad_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_leaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_none_for_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_on_cpu_and_checkpoint, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_output_nr, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_custom_function_intermediates, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_extra_enter_during_bw_no_leak, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_saved_original_with_default_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_scalar_grad_mixed_device, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_select_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_data_tensorimpl_type, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_coroutines_benign_exceptions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_enabled_wraps, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_generator_functions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_materialize_non_diff_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_shape, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sharded_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_both_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_dim_neg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_ind_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_grad_warnings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_thread_shutdown, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_too_many_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unrelated_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unused_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_var_mean_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_view_func_replay_with_modified_state, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_volatile_deprecated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_will_engine_execute_node, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_kwargs_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_reentrant_backwards_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_reentrant_backwards_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_same_graph_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_two_children_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_two_children_early_stop_True, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op_with_CompositeExplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_grad_for_nontensor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_mutable, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_no_output, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_with_key_key_AutogradCUDA, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_tensorlist, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_tensorlist_input_requires_list_grads_with_same_numel, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_basic_make_fx, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_basic, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_nms_dynamic_compile, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_defined_in_python, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_duplicate_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_abstract_overload, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_device_cpu, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_invalid_devices, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_multiple, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CPU, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CompositeImplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_separate, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_supported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_unsupported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_invalid_qualname, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_invalid_schemas, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_is_functional_schema, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_is_tensorlist_like_type, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_define, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_meta_for_data_dependent_shape_operation, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_name_must_match, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_new_data_dependent_symint, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_meta, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_private_ctor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_supported_param_types, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_symints, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_unsupported_schemas, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_allow_python_side_effects_utility, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_constants, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_input_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_numpy_number, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_tracked, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_untracked_global_nested, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_branches_no_arguments, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_free_variable_in_both_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_graph_break_in_one_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_pytree_operands, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_side_effect_in_one_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_with_constant_pred, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_fallback_on_graph_break_simple, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_freevars_as_inputs_to_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_grad_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper_no_hints, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hopify_generic_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_internal_nonlocal, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_lift_tensors_with_compound_expressions, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_kwargs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_lowers_to_graph, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_multi_return, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_pytree_return, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_subgraph_name_is_valid, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_tuple_output, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_no_freevars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_output_with_dict, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_register_subclass, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_var, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_var_used_multiple_times, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_vars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_nonlocal_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_local_list_append_no_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_list, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_tensor, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_tensor_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_nested_nonlocal_list_append_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_nonlocal_list_append_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_nonlocal_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_symint_in_slice, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_unbacked_symbol_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_multiply_scalar, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_allow_local_assign_in_body_fn, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_inductor_compiled_regions_option, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_default_else_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_only, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_recompile, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_pytree_kwargs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_source_fn_stack, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_functional_call_sequential_params_and_buffers, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_fn_with_kwargs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_python_scalar, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_pytree, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_with_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_with_side_effect, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_hessian, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_hessian_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_two_tensors_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_simple, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_two_tensors_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_teardown_resets_nested_graph_breaks, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs_python_struct, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_free_const, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_out_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_outputs_diff_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_over_vmap_captured, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_pytree_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_different_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_same_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects_append_input, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs_tuple_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_conditional_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_graph_break, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_cond_with_invalid_kwargs, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_dropout_inductor, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_cond, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_cond_unbalanced_branches, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_module, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_non_aliasing_util, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_device_mesh_compile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_basic_export, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_constructor_w_dynamo_disable, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_constructor_w_graph_break, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_different_gradient_placement, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dont_recompile_on_same_placement_devicemesh, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic_loss_parallel_log_softmax, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic_slice, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamo_device_mesh_attrs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_graph_output, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_redistribute_unbalanced_correct_strides, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_requires_grad_recompile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_redistribute, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_redistribute_async, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_recompile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_from_local_grad_placements_sequence_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_from_local_grad_placements_sequence_intermediate_as_args, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_grad_placements_sequence, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_grad_placements_sequence_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs_forward_hook, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_fakify_dtensor, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_graph_input_is_async, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_placement_compile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_unwrap_async_collective_tensor_tangent, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_cond_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_quant_packed_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_subgraph_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_nested_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_while_loop_simple_cuda_float32 2025-12-04T15:01:31.0220470Z 2025-12-04T15:01:31.0220818Z Finished inductor/test_compiled_autograd 1/2 ... [2025-12-04 15:01:30.976888][20962.986111824], took 8.11min 2025-12-04T15:01:31.0222028Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-bf57fb8d20e32a72.xml 2025-12-04T15:01:31.0963228Z Running test_testing 1/1 ... [2025-12-04 15:01:31.095949][20963.105171419] 2025-12-04T15:01:31.0963771Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:01:31.0966832Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_testing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:01:31.096291] 2025-12-04T15:02:21.1544740Z 2025-12-04T15:02:21.1545572Z test_testing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_testing_1.1_6250d60ab394f89f_.log 2025-12-04T15:02:21.2499977Z Running 2074 items in this shard: test/test_testing.py::TestTestingCUDA::test_assertEqual_longMessage_cuda, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_bool, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int8, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_not_stop_common_distributed_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_device_type_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_utils_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_get_supported_dtypes_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_bool, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_bool_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_equality_shortcut_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float64, test/test_testing.py::TestTestingCUDA::test_setup_and_teardown_run_for_device_specific_tests_cuda, test/test_testing.py::TestTestingCUDA::test_supported_dtypes_abs_cuda, test/test_testing.py::TestFrameworkUtils::test_filtering_env_var, test/test_testing.py::TestAssertClose::test_bool, test/test_testing.py::TestAssertClose::test_default_tolerance_selection_mismatching_dtypes, test/test_testing.py::TestAssertClose::test_docstring_examples, test/test_testing.py::TestAssertClose::test_matching, test/test_testing.py::TestAssertClose::test_matching_atol, test/test_testing.py::TestAssertClose::test_matching_conjugate_bit, test/test_testing.py::TestAssertClose::test_matching_nan, test/test_testing.py::TestAssertClose::test_matching_nan_with_equal_nan, test/test_testing.py::TestAssertClose::test_matching_rtol, test/test_testing.py::TestAssertClose::test_meta, test/test_testing.py::TestAssertClose::test_mismatching_dtype, test/test_testing.py::TestAssertClose::test_mismatching_dtype_no_check, test/test_testing.py::TestAssertClose::test_mismatching_layout, test/test_testing.py::TestAssertClose::test_mismatching_layout_no_check, test/test_testing.py::TestAssertClose::test_mismatching_shape, test/test_testing.py::TestAssertClose::test_mismatching_stride, test/test_testing.py::TestAssertClose::test_mismatching_stride_no_check, test/test_testing.py::TestAssertClose::test_mismatching_types, test/test_testing.py::TestAssertClose::test_mismatching_types_subclasses, test/test_testing.py::TestAssertClose::test_mismatching_types_type_equality, test/test_testing.py::TestAssertClose::test_mismatching_values, test/test_testing.py::TestAssertClose::test_mismatching_values_atol, test/test_testing.py::TestAssertClose::test_mismatching_values_rtol, test/test_testing.py::TestAssertClose::test_none, test/test_testing.py::TestAssertClose::test_none_mismatch, test/test_testing.py::TestAssertClose::test_numpy, test/test_testing.py::TestAssertClose::test_only_atol, test/test_testing.py::TestAssertClose::test_only_rtol, test/test_testing.py::TestAssertClose::test_scalar, test/test_testing.py::TestAssertClose::test_unexpected_error_compare, test/test_testing.py::TestAssertClose::test_unexpected_error_originate, test/test_testing.py::TestAssertClose::test_unknown_layout, test/test_testing.py::TestAssertClose::test_unknown_type, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_cuda, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_no_check_cuda, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_atol, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_scalars, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_tensor_likes, test/test_testing.py::TestAssertCloseErrorMessage::test_mismatched_elements, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_callable, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_str, test/test_testing.py::TestAssertCloseErrorMessage::test_not_close, test/test_testing.py::TestAssertCloseErrorMessage::test_not_equal, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_rtol, test/test_testing.py::TestAssertCloseErrorMessage::test_small_float_dtype, test/test_testing.py::TestAssertCloseErrorMessage::test_zero_div_zero, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_keys, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_values_msg, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_len, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_coalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_uncoalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_indices_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_nnz, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_sparse_dims, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_matching, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_matching, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_matching, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_matching, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_channel, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_tensor, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_is_quantized, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_qscheme, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_uint8, test/test_testing.py::TestTestParametrization::test_apply_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_compose_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_default_names, test/test_testing.py::TestTestParametrization::test_modules_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_multiple_handling_of_same_param_error, test/test_testing.py::TestTestParametrization::test_name_fn, test/test_testing.py::TestTestParametrization::test_ops_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_reparametrize, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_1, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_2, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_3, test/test_testing.py::TestTestParametrization::test_subtest_names, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_6, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_name_non_primitive_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_invalid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_valid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_list_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_decorator_applies_module_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_multiple_handling_of_same_param_error_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_name_fn_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_decorator_applies_op_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_param_specific_decoration_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_1_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_2_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_3_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_unparametrized_names_cuda, test/test_testing.py::TestImports::test_circular_dependencies, test/test_testing.py::TestImports::test_lazy_imports_are_lazy, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_functorch, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_torch, test/test_testing.py::TestImports::test_no_warning_on_import, test/test_testing.py::TestImports::test_not_import_sympy, test/test_testing.py::TestOpInfos::test_sample_input, test/test_testing.py::TestOpInfos::test_sample_input_metadata, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_T_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___radd___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rand___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rdiv___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmod___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmul___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___ror___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rpow___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rsub___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rxor___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators__chunk_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_aminmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_arange_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_as_strided_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_atan2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bernoulli_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_left_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_right_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bucketize_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cauchy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_max_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_min_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_complex_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_copysign_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cov_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_embed_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diff_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_floor_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_no_rounding_mode_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_trunc_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_empty_permuted_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eye_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fliplr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_flipud_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_float_power_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_floor_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmod_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gather_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gcd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ge_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_geometric_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gradient_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_heaviside_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_histogramdd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hypot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igamma_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igammac_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_isclose_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_item_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_return_by_ref_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_kthvalue_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lcm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ldexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_le_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_cross_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_grad_oriented_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_log_normal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logaddexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logcumsumexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_fill_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_max_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_maximum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mean_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_median_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_min_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_minimum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_movedim_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mul_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_multinomial_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_native_layer_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ne_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_neg_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nextafter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_embedding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gaussian_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_group_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hardtanh_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hinge_embedding_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_huber_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_l1_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_margin_ranking_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multi_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multilabel_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_poisson_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_prelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rms_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rrelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_soft_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_softshrink_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_normal_in_place_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ormqr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_polar_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_pow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_remainder_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_renorm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_roll_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rot90_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rsub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_bartlett_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_blackman_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_gaussian_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hann_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_kaiser_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_nuttall_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_h_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_he_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_laguerre_polynomial_l_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_legendre_polynomial_p_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_xlog1py_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_zeta_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sum_to_size_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_take_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_trace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_tril_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_triu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_true_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_uniform_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vdot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_where_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_xlogy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_H_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_T_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___getitem___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmatmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__batch_norm_with_update_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__chunk_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__native_batch_norm_legit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_lengths_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_offsets_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__softmax_backward_data_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_put_accumulate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__upsample_bilinear2d_aa_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_decomposed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_alias_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_all_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_allclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_aminmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_any_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_arange_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argsort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argwhere_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_partial_views_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_baddbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bernoulli_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bincount_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_block_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_shapes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cartesian_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cauchy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_inverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_column_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_combinations_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_constant_pad_nd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_corrcoef_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_count_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cov_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumulative_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagflat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diff_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_einsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_permuted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_equal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eye_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flip_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fliplr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flipud_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gather_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geometric_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geqrf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gradient_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hash_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_histc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_inner_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_istft_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_item_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_2inputs_2outputs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_4inputs_with_extra_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kron_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kthvalue_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lerp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cond_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_det_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eig_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvalsh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_householder_product_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_grad_oriented_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_multi_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_subgradients_at_zero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_singular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_slogdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_triangular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svdvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorsolve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vander_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vecdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vector_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logcumsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_unpack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mH_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mT_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matrix_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_pool2d_with_indices_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_list_of_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_variadic_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_msort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_multinomial_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmedian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanquantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nansum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_dropout_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_alpha_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_channel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_similarity_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_ctc_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_bag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gaussian_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_glu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardswish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_huber_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_instance_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_area_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_trilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_kl_div_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_leaky_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_local_response_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mse_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_head_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_one_hot_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_circular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_constant_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_reflect_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_negative_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pairwise_distance_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_unshuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_poisson_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rms_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_smooth_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_static_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_fro_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_inf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_nuc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_in_place_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_number_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ormqr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_outer_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pca_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pinverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_quantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rand_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ravel_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_renorm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_interleave_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize_as__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_roll_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rot90_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scalar_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_searchsorted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_mm_reduce_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_sampled_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_list_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_multiple_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_to_size_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_along_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensor_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensordot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_sparse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_topk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapz_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triangular_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unflatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_uniform_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_consecutive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unravel_index_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_real_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zero__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_like_cuda_float32 2025-12-04T15:02:21.3420613Z 2025-12-04T15:02:21.3420878Z Finished test_testing 1/1 ... [2025-12-04 15:02:21.158185][21013.167409357], took 0.83min 2025-12-04T15:02:21.3421815Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_testing/test_testing-4c4caba52af0adff.xml 2025-12-04T15:02:21.3422759Z Running inductor/test_autoheuristic 1/1 ... [2025-12-04 15:02:21.298246][21013.307466111] 2025-12-04T15:02:21.3423283Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:02:21.3424363Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_autoheuristic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:21.298584] 2025-12-04T15:02:27.5759152Z 2025-12-04T15:02:27.5760820Z inductor/test_autoheuristic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_autoheuristic_1.1_6939193d627efb00_.log 2025-12-04T15:02:27.5761675Z Running 0 items in this shard: 2025-12-04T15:02:27.5761866Z 2025-12-04T15:02:27.5762183Z Finished inductor/test_autoheuristic 1/1 ... [2025-12-04 15:02:27.575553][21019.584778307], took 0.10min 2025-12-04T15:02:27.5944808Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_autoheuristic/inductor.test_autoheuristic-10f7d7896ce04bc8.xml 2025-12-04T15:02:27.6554136Z Running inductor/test_cutedsl_template 1/1 ... [2025-12-04 15:02:27.655052][21019.664274421] 2025-12-04T15:02:27.6554660Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:02:27.6557485Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_template.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:27.655390] 2025-12-04T15:02:33.9325088Z 2025-12-04T15:02:33.9326058Z inductor/test_cutedsl_template 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_template_1.1_c65b62856ae46e85_.log 2025-12-04T15:02:33.9332118Z Running 13 items in this shard: test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cse_integration, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e_autotune, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_op_overrides, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_defines, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_get_output_hook, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_indented_buffer_usage, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_modification_subgraph, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_multiple_templates_unique_names, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_render_includes_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_aliasing, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_env_contains_hooks 2025-12-04T15:02:33.9337609Z 2025-12-04T15:02:33.9337940Z Finished inductor/test_cutedsl_template 1/1 ... [2025-12-04 15:02:33.932186][21025.941410258], took 0.10min 2025-12-04T15:02:33.9515575Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-c4d4e9aba2280ad9.xml 2025-12-04T15:02:34.0357579Z Running inductor/test_benchmark_fusion 1/1 ... [2025-12-04 15:02:34.035375][21026.044596486] 2025-12-04T15:02:34.0358104Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:02:34.0360690Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmark_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:34.035692] 2025-12-04T15:03:05.2106365Z 2025-12-04T15:03:05.2107524Z inductor/test_benchmark_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmark_fusion_1.1_f16e3698532d27f8_.log 2025-12-04T15:03:05.2116337Z Running 16 items in this shard: test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_avoid_register_spilling_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_foreach_kernel_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_register_spills_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_resnet18_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_softmax_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_tield_kernel_fusion_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkingTest::test_benchmark_on_non_zero_device, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_changed_layout, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_extern_code, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_template_code, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_avoid_register_spilling_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_foreach_kernel_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_register_spills_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_resnet18_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_softmax_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_tield_kernel_fusion_cpu 2025-12-04T15:03:05.2123397Z 2025-12-04T15:03:05.2123729Z Finished inductor/test_benchmark_fusion 1/1 ... [2025-12-04 15:03:05.210321][21057.219544998], took 0.52min 2025-12-04T15:03:05.2306053Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-8a04be886b6d69cf.xml 2025-12-04T15:03:05.3130755Z Running inductor/test_remote_cache 1/1 ... [2025-12-04 15:03:05.312562][21057.321785061] 2025-12-04T15:03:05.3131348Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:03:05.3133736Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_remote_cache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:05.312907] 2025-12-04T15:03:08.9345850Z 2025-12-04T15:03:08.9346573Z inductor/test_remote_cache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_remote_cache_1.1_e90358269eb2823f_.log 2025-12-04T15:03:08.9348325Z Running 3 items in this shard: test/inductor/test_remote_cache.py::TestRemoteCache::test_failure_logging, test/inductor/test_remote_cache.py::TestRemoteCache::test_failure_no_sample, test/inductor/test_remote_cache.py::TestRemoteCache::test_normal_logging 2025-12-04T15:03:08.9349448Z 2025-12-04T15:03:08.9349760Z Finished inductor/test_remote_cache 1/1 ... [2025-12-04 15:03:08.934250][21060.943470308], took 0.06min 2025-12-04T15:03:08.9551000Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-c7e05865cddca77f.xml 2025-12-04T15:03:08.9948411Z Running inductor/test_coordinate_descent_tuner 1/1 ... [2025-12-04 15:03:08.994501][21061.003726562] 2025-12-04T15:03:08.9948959Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:03:08.9952368Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_coordinate_descent_tuner.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:08.994851] 2025-12-04T15:03:19.4809206Z 2025-12-04T15:03:19.4810449Z inductor/test_coordinate_descent_tuner 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_coordinate_descent_tuner_1.1_2fd6afd7cb5bda25_.log 2025-12-04T15:03:19.4813551Z Running 5 items in this shard: test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_abs_function, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_get_neighbour_values, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_no_neighbors, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_persistent_reduction, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_value_too_large 2025-12-04T15:03:19.4816315Z 2025-12-04T15:03:19.4816773Z Finished inductor/test_coordinate_descent_tuner 1/1 ... [2025-12-04 15:03:19.480500][21071.48972451], took 0.17min 2025-12-04T15:03:19.5000464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6d20a7277844030b.xml 2025-12-04T15:03:19.5735297Z Running inductor/test_inplace_padding 1/1 ... [2025-12-04 15:03:19.573146][21071.582368577] 2025-12-04T15:03:19.5736002Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:03:19.5738117Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inplace_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:19.573460] 2025-12-04T15:03:38.0253392Z 2025-12-04T15:03:38.0254466Z inductor/test_inplace_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inplace_padding_1.1_25c4b19bcfb0badf_.log 2025-12-04T15:03:38.0258896Z Running 9 items in this shard: test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel_max_autotune, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_input, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_output, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero_cpp_wrapper, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_too_large, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_due_to_fusion, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_input 2025-12-04T15:03:38.0262669Z 2025-12-04T15:03:38.0263003Z Finished inductor/test_inplace_padding 1/1 ... [2025-12-04 15:03:38.024837][21090.034058883], took 0.31min 2025-12-04T15:03:38.0448531Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-6a2d2929a87aa7f5.xml 2025-12-04T15:03:38.1321153Z Running inductor/test_cudacodecache 1/1 ... [2025-12-04 15:03:38.131755][21090.140978019] 2025-12-04T15:03:38.1321684Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:03:38.1324956Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudacodecache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:38.132074] 2025-12-04T15:03:45.8616030Z 2025-12-04T15:03:45.8616926Z inductor/test_cudacodecache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cudacodecache_1.1_20e9a908d42a6261_.log 2025-12-04T15:03:45.8618757Z Running 3 items in this shard: test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_async_compile, test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_compilation_error, test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_cuda_load 2025-12-04T15:03:45.8620001Z 2025-12-04T15:03:45.8620327Z Finished inductor/test_cudacodecache 1/1 ... [2025-12-04 15:03:45.861267][21097.870491369], took 0.13min 2025-12-04T15:03:45.8813238Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-b498ae4cc20525c9.xml 2025-12-04T15:03:45.9475686Z Running inductor/test_minifier_utils 1/1 ... [2025-12-04 15:03:45.947193][21097.956416342] 2025-12-04T15:03:45.9476202Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:03:45.9478884Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:45.947500] 2025-12-04T15:03:50.5206994Z 2025-12-04T15:03:50.5208166Z inductor/test_minifier_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_utils_1.1_82d82b53a102b66f_.log 2025-12-04T15:03:50.5210036Z Running 3 items in this shard: test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_convert_module_to_string, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_invalid_output, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_non_exportable 2025-12-04T15:03:50.5211293Z 2025-12-04T15:03:50.5211615Z Finished inductor/test_minifier_utils 1/1 ... [2025-12-04 15:03:50.520290][21102.529514054], took 0.08min 2025-12-04T15:03:50.5404240Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-4c5fe50d62df582d.xml 2025-12-04T15:03:50.5707921Z Running inductor/test_debug_trace 1/1 ... [2025-12-04 15:03:50.570417][21102.579642159] 2025-12-04T15:03:50.5708471Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:03:50.5711097Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_debug_trace.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:50.570715] 2025-12-04T15:04:06.2173852Z 2025-12-04T15:04:06.2175012Z inductor/test_debug_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_debug_trace_1.1_cc4f32af9453e690_.log 2025-12-04T15:04:06.2176736Z Running 3 items in this shard: test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_multi_tempalte, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_printer_const, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_trace 2025-12-04T15:04:06.2177859Z 2025-12-04T15:04:06.2178168Z Finished inductor/test_debug_trace 1/1 ... [2025-12-04 15:04:06.217037][21118.226261963], took 0.26min 2025-12-04T15:04:06.2372676Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-179ecdae5d21ef0e.xml 2025-12-04T15:04:06.3228854Z Running export/test_tree_utils 1/1 ... [2025-12-04 15:04:06.322533][21118.331756169] 2025-12-04T15:04:06.3229367Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:04:06.3232225Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tree_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:06.322847] 2025-12-04T15:04:09.9443947Z 2025-12-04T15:04:09.9445207Z export/test_tree_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tree_utils_1.1_0e627f819fabbb55_.log 2025-12-04T15:04:09.9446554Z Running 2 items in this shard: test/export/test_tree_utils.py::TestTreeUtils::test_equivalence_check, test/export/test_tree_utils.py::TestTreeUtils::test_reorder_kwargs 2025-12-04T15:04:09.9447288Z 2025-12-04T15:04:09.9447595Z Finished export/test_tree_utils 1/1 ... [2025-12-04 15:04:09.943974][21121.953198926], took 0.06min 2025-12-04T15:04:09.9641951Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-bacbff1a865ff8bb.xml 2025-12-04T15:04:10.0007222Z Running inductor/test_triton_wrapper 1/1 ... [2025-12-04 15:04:10.000383][21122.009607265] 2025-12-04T15:04:10.0008105Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:04:10.0010189Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_wrapper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:10.000680] 2025-12-04T15:04:26.5969452Z 2025-12-04T15:04:26.5970810Z inductor/test_triton_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_wrapper_1.1_25aa967110a2fbe1_.log 2025-12-04T15:04:26.5971978Z Running 1 items in this shard: test/inductor/test_triton_wrapper.py::TestTritonWrapper::test_wrapper_using_gpu_seed 2025-12-04T15:04:26.5972502Z 2025-12-04T15:04:26.5972832Z Finished inductor/test_triton_wrapper 1/1 ... [2025-12-04 15:04:26.596486][21138.605710798], took 0.28min 2025-12-04T15:04:26.6177111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-e71c26709471ff2e.xml 2025-12-04T15:04:26.6906923Z Running inductor/test_static_cuda_launcher 1/1 ... [2025-12-04 15:04:26.690292][21138.699514947] 2025-12-04T15:04:26.6907585Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:04:26.6909648Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_static_cuda_launcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:26.690607] 2025-12-04T15:04:41.1340659Z 2025-12-04T15:04:41.1341901Z inductor/test_static_cuda_launcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_static_cuda_launcher_1.1_0c71a221d8835012_.log 2025-12-04T15:04:41.1349813Z Running 17 items in this shard: test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_basic, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_basic_1arg, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_constexpr, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_high_shared_mem, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_implied_constant, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_empty_tensor, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_many_args, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_no_args, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_signed_integers, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_too_high_shared_mem, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_unsigned_integers, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_any, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_basic_compile, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_disable_static_cuda_launcher, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_empty_tensor, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_incompatible_code, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_static_launch_user_defined_triton_kernels 2025-12-04T15:04:41.1357288Z 2025-12-04T15:04:41.1357631Z Finished inductor/test_static_cuda_launcher 1/1 ... [2025-12-04 15:04:41.133598][21153.14281808], took 0.24min 2025-12-04T15:04:41.1548445Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-45ff8ae422230f99.xml 2025-12-04T15:04:41.2384129Z Running inductor/test_provenance_tracing 1/1 ... [2025-12-04 15:04:41.237889][21153.247111481] 2025-12-04T15:04:41.2384862Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:04:41.2386837Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_provenance_tracing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:41.238232] 2025-12-04T15:05:50.5854251Z 2025-12-04T15:05:50.5855996Z inductor/test_provenance_tracing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_provenance_tracing_1.1_80110daa3530439c_.log 2025-12-04T15:05:50.5869725Z Running 16 items in this shard: test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_combo_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_cpu, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_cuda, test/inductor/test_provenance_tracing.py::TestProvenanceTracingArtifact::test_triton_kernel_to_post_grad_tracing_extern_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingNodeMapping::test_create_node_mapping, test/inductor/test_provenance_tracing.py::TestProvenanceTracingNodeMeta::test_pattern_matcher_transfer_meta, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_cpu_extern_kernel, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_create_kernel_information_json_function, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_deferred_triton_kernels, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_kernel_information_generation, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_no_kernel_information_without_provenance_tracking, test/inductor/test_provenance_tracing.py::TestProvenanceTracingStackTraces::test_tlparse_kernel_stack_traces, test/inductor/test_provenance_tracing.py::TestProvenanceTracingKernelContextCpu::test_aoti_python_stack_traces_cpu, test/inductor/test_provenance_tracing.py::TestProvenanceTracingKernelContextCpu::test_jit_inductor_with_flag_cpu, test/inductor/test_provenance_tracing.py::TestProvenanceTracingKernelContextGpu::test_aoti_python_stack_traces_cuda, test/inductor/test_provenance_tracing.py::TestProvenanceTracingKernelContextGpu::test_jit_inductor_with_flag_cuda 2025-12-04T15:05:50.5881838Z 2025-12-04T15:05:50.5882339Z Finished inductor/test_provenance_tracing 1/1 ... [2025-12-04 15:05:50.585034][21222.594257927], took 1.16min 2025-12-04T15:05:50.6071674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_provenance_tracing/inductor.test_provenance_tracing-6455ccf06df051be.xml 2025-12-04T15:05:50.6869353Z Running inductor/test_memory_planning 1/1 ... [2025-12-04 15:05:50.686485][21222.695707979] 2025-12-04T15:05:50.6870202Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:05:50.6871699Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_memory_planning.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:50.686803] 2025-12-04T15:06:08.4856308Z 2025-12-04T15:06:08.4857478Z inductor/test_memory_planning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_memory_planning_1.1_fa1d6b036138d22f_.log 2025-12-04T15:06:08.4859688Z Running 4 items in this shard: test/inductor/test_memory_planning.py::TestMemoryPlanning::test_aoti, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_cpp_wrapper, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_python_wrapper, test/inductor/test_memory_planning.py::TestMemoryPlanning::test_unbacked_symint 2025-12-04T15:06:08.4861212Z 2025-12-04T15:06:08.4861564Z Finished inductor/test_memory_planning 1/1 ... [2025-12-04 15:06:08.485153][21240.494377644], took 0.30min 2025-12-04T15:06:08.5061752Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory_planning/inductor.test_memory_planning-d9b25b367275156e.xml 2025-12-04T15:06:08.6044730Z Running export/test_cpp_serdes 1/1 ... [2025-12-04 15:06:08.604080][21240.613302666] 2025-12-04T15:06:08.6045227Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:06:08.6047815Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_cpp_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:06:08.604399] 2025-12-04T15:07:32.8346367Z 2025-12-04T15:07:32.8347463Z export/test_cpp_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_cpp_serdes_1.1_75563679f31ba4f4_.log 2025-12-04T15:07:32.8532246Z Running 431 items in this shard: test/export/test_cpp_serdes.py::CppSerdesTestExport::test__scaled_dot_product_flash_attention_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_additional_inputs_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_allow_explicit_guards_as_runtime_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_annotate_on_assert_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_args_type_checked_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_aten_lift_fresh_copy_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_attention_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_attr_assignment_extra_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_constrain_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_constant_relation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_linear_relation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_automatic_dynamic_shapes_simple_equality_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_baddbmm_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_non_strict_fake_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_basic_non_strict_real_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_bincount_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_buffer_util_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_constructor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_constructor_torch_ir_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_capture_subclass_wrong_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_ccode_python_mod_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cdist_forward_compute_mode_zero_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_check_specialized_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_checks_to_constrain_range_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cleanup_dynamic_markers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_colin_unbacked_backed_vr_sub_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_colon_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_compiling_state_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_access_identical_symint_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_branches_return_constant_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_branches_return_same_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_contains_unbacked_no_escape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_int_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_with_module_stack_export_with_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cond_with_module_stack_export_with_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_aliasing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_input_naming_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_no_user_inp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_output_dup_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_requires_grad_const_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_return_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_with_non_functional_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constant_tensor_with_non_functional_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_decomp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_in_eager_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_with_constrain_value_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_constrain_size_with_various_cases_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_conv_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_crop_like_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_cse_for_symint_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_functionalize_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_functionalize_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_auto_warn_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_op_preserve_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_custom_tag_metadata_re_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_batch_norm_functional_predispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_item_in_prim_after_decomposition_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_decomp_item_in_prim_before_decomposition_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_default_decomposition_core_cia_ops_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_1_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_integer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_repeat_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_simplified_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_derived_dim_repeat_derived_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_nonstrict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_nonstrict_with_stacktrace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_detect_leak_strict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_gpu_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_mutation_float_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_device_to_static_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_1_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_auto_and_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_divisibility_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_dynamic_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_hint_range_violations_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dim_hint_ranges_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_disable_forced_specializations_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_disable_forced_specializations_ok_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_gather_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_gather_into_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_reduce_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_all_to_all_single_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_distributed_reduce_scatter_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dont_duck_size_for_auto_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_double_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_aliasing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_list_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_checks_mutation_with_nan_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_fake_kernel_inference_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_draft_export_infers_fake_kernel_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_duplicate_modules_with_non_persistent_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_lr_shift_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_bounds_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_builder_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_dataclass_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_inferred_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_generic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_user_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_serdes_various_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_spec_with_pytree_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_shapes_wrapped_with_shape_guards_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_dynamic_sym_round_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_ends_of_bounds_oblivious_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_enum_str_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_error_does_not_reference_eager_fallback_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_error_when_passing_mutating_primitive_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_exception_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_expand_copy_export_handles_implicit_true_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_api_with_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_as_backend_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_lifted_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_symbol_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_associative_scan_symbol_scandim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_subclass_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_symbool_pred_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cond_warns_constant_pred_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_decomp_table_basic_pop_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_decomp_table_container_methods_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_op_lib_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_triton_kernel_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_custom_triton_kernel_mutable_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_cyclic_reference_leak_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomp_torture_case_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomp_torture_case_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomps_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_decomps_simple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_dynamo_config_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_run_decomp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_container_type_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_for_training_with_state_dict_hooks_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_default_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_keyword_only_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_pytree_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_keyword_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_keyword_pytree_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_func_with_var_postional_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_function_schema_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_graph_with_no_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_bug_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_dynamic_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_input_mutation_static_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_leak_compile_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_linear_preserve_dynamic_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_max_nonstrict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_max_onnx_reported_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_mod_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_preserve_linear_at_aot_level_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_preserve_linear_but_not_custom_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_rnn_variants_with_warning_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_scan_pytree_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_script_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_statically_known_true_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_then_compile_tensor_ctor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_autocast_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_fake_tensor_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_inline_constraints_complex_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_inline_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_set_grad_enabled_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_export_with_wrong_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_external_call_non_strict_real_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fake_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fake_weights_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_filter_traceback_frames_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_flex_attention_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_float_conversion_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_float_conversion_from_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_fqn_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_from_node_metadata_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_full_on_scalar_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_function_holding_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_hints_wrapper_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_hoo_inline_users_issue_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_if_functional_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_if_post_autograd_op_preserved_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inductor_backend_inside_nonstrict_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_class_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_class_method_recursive_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_inline_script_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_int_shape_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_intermediate_shape_comp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_invalid_pytree_dynamo_graph_capture_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_is_exporting_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_is_nonzero_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_isnonzero_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_issue_113041_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_issue_157289_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_issue_161902_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_istft_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_invalid_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_linear_convd_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_keep_composite_ops_linear_convd_for_training_ir_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_kwarg_dynamic_shapes_diff_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_kwargs_reorder_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_layer_norm_unbacked_normalized_shape_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_layer_sharing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_lazy_module_kwargs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_linear_conv_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_malformed_fqn_from_source_name_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_map_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_map_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mask_nonzero_static_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_masked_select_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_math_pow_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mismatched_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_mixed_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_dict_key_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_input_subclasses_parameterization_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_list_slice_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_module_with_dict_container_inp_out_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_modules_access_for_deleted_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_more_multidimensional_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multidimensional_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multinomial_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_multiple_definitions_same_name_dim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_namedtuple_input_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_native_multi_attention_head_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_dynamic_shapes_spec_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_fake_tensor_leak_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_constant_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_init_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nested_module_with_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nn_module_stack_shared_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_check_is_size_error_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_suggested_fixes_for_data_dependent_errors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_3_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_no_tensor_computation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_persistent_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_strict_dynamic_shapes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_non_strict_dynamic_shapes_suggested_fixes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_none_buffers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonstrict_retrace_preserves_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonzero_2_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_nonzero_dynamic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_not_registered_parameter_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_operator_aten_tensor_mode_variant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_output_node_name_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pad_sequence_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_param_util_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_partial_patched_forward_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_collisions_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_collisions_hoo_subgraphs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_order_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_naming_order_variadic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_placeholder_update_preserving_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_predispatch_cond_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_predispatch_grad_wrappers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_annotation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_module_call_signature_unflatten_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_requires_grad_placeholders_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_preserve_shape_dynamism_for_unused_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_profiling_code_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_python_asserts_with_sym_int_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pytree_register_data_class_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_pytree_register_nested_data_class_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_range_constraints_with_replacement_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_alias_dtype_mismatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_bool_cast_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_errors_on_aliasing_custom_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_for_max_op_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_real_tensor_size_mismatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_redundant_assert_max_upper_bound_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_redundant_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_refine_dynamic_shapes_from_suggested_fixes_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_register_constant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_repeat_interleave_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_replace_unbacked_with_very_large_upperbound_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_replaced_unbacked_bindings_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_reshape_view_helper_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_retracable_ep_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_retrace_pre_autograd_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decomposition_supports_user_input_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decompositions_keep_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_run_decompositions_keep_tensor_constant_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_for_prim_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_for_prm_str_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_runtime_assert_with_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sdpa_gqa_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sequential_slicing_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_example_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_as_side_effect_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_empty_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_set_grad_unflatten_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_setgrad_lifted_tensor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_shared_submodule_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_simple_export_for_training_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_simple_unbacked_view_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_size_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_slice_nn_module_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_solver_unsupported_sympy_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_specialize_derived_dim_roots_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_split_const_gm_with_lifted_constants_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_stack_trace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_stack_trace_make_fx_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_primitives_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_shape_attribute_assignment_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_state_tensors_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_static_dim_constraints_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_context_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_complicated_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_const_metadata_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclass_nested_attr_access_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclasses_parameterization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_subclasses_parameterization_nested_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggest_torch_checks_with_non_negative_check_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggest_torch_checks_with_regular_check_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_for_data_dependent_errors_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_suggested_fixes_new_roots_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_float_operators_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_or_sym_and_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_sym_sqrt_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symbool_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symfloat_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_additional_inputs_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_basic_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_ranges_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_shapes_collection_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_input_specialization_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_item_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_output_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_symint_tensor_return_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tag_ac_export_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_attribute_zero_args_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_constant_aten_to_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tensor_constant_with_wrapped_method_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_multiple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tolist_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_torch_check_eq_commutativity_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_torch_fn_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_trace_under_fake_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_train_eval_on_exported_preautograd_module_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_tril_dynamic_diagonal_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_triu_dynamic_diagonal_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_3d_matmul_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_bincount_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_bindings_for_divisible_u_symint_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_deferred_runtime_retrace_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_expand_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_infer_size_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_kth_value_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_linear_layer_norm_input_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_noncontig_lin_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_pad_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_scalar_constructor_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_slice_forward_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_slice_simple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_stack_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_to_cond_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_to_cond_passthrough_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unbacked_unsqueeze_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_asserts_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_buffer_update_child2parent_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_closure_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_isinstance_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_dispatch_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_shared_submodule_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_multiple_graphs_state_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_no_unroll_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_placeholder_update_child2parent_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_5_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_6_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_buf_8_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_const_preserving_3_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_const_preserving_3_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_6_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_9_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unflatten_random_dag_preserving_4_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unused_aliases_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_unused_constant_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_uplift_common_custom_meta_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_uplift_common_custom_meta_with_multiple_calls_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_use_embedding_twice_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_user_input_and_buffer_mutation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_vmap_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_vmap_custom_autograd_function_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_vmap_to_assert_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_where_decomp_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_assert_separation_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_index_assertions_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_simple_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_while_loop_tensor_constant_idx_cpp_serdes, test/export/test_cpp_serdes.py::CppSerdesTestExport::test_wrapper_module_cpp_serdes 2025-12-04T15:07:32.8718731Z 2025-12-04T15:07:32.8719034Z Finished export/test_cpp_serdes 1/1 ... [2025-12-04 15:07:32.835381][21324.844604295], took 1.40min 2025-12-04T15:07:32.8720212Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_cpp_serdes/export.test_cpp_serdes-72e11f38870e0d13.xml 2025-12-04T15:07:33.0023764Z Running inductor/test_control_flow 2/4 ... [2025-12-04 15:07:33.001930][21325.011152706] 2025-12-04T15:07:33.0024345Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:07:33.0027102Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:07:33.002294] 2025-12-04T15:18:46.8926987Z 2025-12-04T15:18:46.8927872Z inductor/test_control_flow 2/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_2.4_3b4432ec9408add0_.log 2025-12-04T15:18:46.9073895Z Running 184 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_control_flow_with_precomputed_size, test/inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_use_buffers_from_outer_scope, test/inductor/test_control_flow.py::CondTests::test_output_on_different_device, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cuda_dynamic_False, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_device_cuda, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cpu_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_True_autograd_True 2025-12-04T15:18:46.9175559Z 2025-12-04T15:18:46.9176189Z Finished inductor/test_control_flow 2/4 ... [2025-12-04 15:18:46.917303][21998.926518615], took 11.23min 2025-12-04T15:18:46.9393050Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5ad0fee917746162.xml 2025-12-04T15:18:48.1425297Z Uploading artifacts took 1.12 seconds 2025-12-04T15:18:48.1428801Z Running test_sort_and_select 1/1 ... [2025-12-04 15:18:48.142611][22000.151834358] 2025-12-04T15:18:48.1429274Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:18:48.1433685Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sort_and_select.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:48.143043] 2025-12-04T15:18:55.7722393Z 2025-12-04T15:18:55.7723730Z test_sort_and_select 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sort_and_select_1.1_bec7fa88f7702fb0_.log 2025-12-04T15:18:55.7763934Z Running 111 items in this shard: test/test_sort_and_select.py::TestSortAndSelectCUDA::test_complex_unsupported_cpu_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_devices_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_isin_different_dtypes_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_kthvalue_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_kthvalue_scalar_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_msort_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_output_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_1d_parallel_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_discontiguous_slow_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_expanded_tensor_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_large_slice_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_overflow_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_restride_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_sort_stable_none_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_against_numpy_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_stable_sort_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_1d_output_discontiguous_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_4d_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_arguments_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_integral_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_lower_precision_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_lower_precision_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_noncontiguous_gpu_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_nonfinite_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_quantized_scalar_input_cuda, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_bfloat16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_topk_zero_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_consecutive_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_bool, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_float64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int16, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int32, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int64, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_int8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_cuda_uint8, test/test_sort_and_select.py::TestSortAndSelectCUDA::test_unique_dim_cuda 2025-12-04T15:18:55.7803241Z 2025-12-04T15:18:55.7803511Z Finished test_sort_and_select 1/1 ... [2025-12-04 15:18:55.772023][22007.781246429], took 0.13min 2025-12-04T15:18:55.7942017Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sort_and_select/test_sort_and_select-049427debff60b53.xml 2025-12-04T15:18:55.9125681Z Running functorch/test_rearrange 1/1 ... [2025-12-04 15:18:55.912180][22007.921403] 2025-12-04T15:18:55.9126160Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:18:55.9129071Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_rearrange.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:55.912522] 2025-12-04T15:18:59.6338037Z 2025-12-04T15:18:59.6339120Z functorch/test_rearrange 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_rearrange_1.1_a7b15b1a80eb0b56_.log 2025-12-04T15:18:59.6343236Z Running 10 items in this shard: test/functorch/test_rearrange.py::TestRearrange::test_0_dim_tensor, test/functorch/test_rearrange.py::TestRearrange::test_collapsed_ellipsis_errors_out, test/functorch/test_rearrange.py::TestRearrange::test_concatenations_and_stacking, test/functorch/test_rearrange.py::TestRearrange::test_dimension_mismatch_no_ellipsis, test/functorch/test_rearrange.py::TestRearrange::test_dimension_mismatch_with_ellipsis, test/functorch/test_rearrange.py::TestRearrange::test_ellipsis_ops, test/functorch/test_rearrange.py::TestRearrange::test_rearrange_consistency, test/functorch/test_rearrange.py::TestRearrange::test_rearrange_permutations, test/functorch/test_rearrange.py::TestRearrange::test_squeeze, test/functorch/test_rearrange.py::TestRearrange::test_unsqueeze 2025-12-04T15:18:59.6346712Z 2025-12-04T15:18:59.6347019Z Finished functorch/test_rearrange 1/1 ... [2025-12-04 15:18:59.633461][22011.642685015], took 0.06min 2025-12-04T15:18:59.6553531Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-cccd30d217a8d074.xml 2025-12-04T15:18:59.7015160Z Running test_package 1/1 ... [2025-12-04 15:18:59.701176][22011.710400256] 2025-12-04T15:18:59.7015593Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:18:59.7018367Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_package.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:59.701510] 2025-12-04T15:19:05.3265408Z 2025-12-04T15:19:05.3266415Z test_package 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_package_1.1_f2ef9e9917fb97f5_.log 2025-12-04T15:19:05.3303913Z Running 137 items in this shard: test/test_package.py::TestAnalyze::test_trace_dependencies, test/test_package.py::TestDependencyAPI::test_allow_empty_with_error, test/test_package.py::TestDependencyAPI::test_broken_dependency, test/test_package.py::TestDependencyAPI::test_deny, test/test_package.py::TestDependencyAPI::test_deny_glob, test/test_package.py::TestDependencyAPI::test_extern, test/test_package.py::TestDependencyAPI::test_extern_glob, test/test_package.py::TestDependencyAPI::test_extern_glob_allow_empty, test/test_package.py::TestDependencyAPI::test_externing_c_extension, test/test_package.py::TestDependencyAPI::test_implicit_intern, test/test_package.py::TestDependencyAPI::test_intern_error, test/test_package.py::TestDependencyAPI::test_invalid_import, test/test_package.py::TestDependencyAPI::test_mock, test/test_package.py::TestDependencyAPI::test_mock_glob, test/test_package.py::TestDependencyAPI::test_mock_glob_allow_empty, test/test_package.py::TestDependencyAPI::test_pickle_mocked, test/test_package.py::TestDependencyAPI::test_pickle_mocked_all, test/test_package.py::TestDependencyAPI::test_repackage_mocked_module, test/test_package.py::TestDependencyHooks::test_extern_and_mock_hook, test/test_package.py::TestDependencyHooks::test_multiple_extern_hooks, test/test_package.py::TestDependencyHooks::test_multiple_mock_hooks, test/test_package.py::TestDependencyHooks::test_remove_hooks, test/test_package.py::TestDependencyHooks::test_single_hook, test/test_package.py::TestDiGraph::test_all_paths, test/test_package.py::TestDiGraph::test_contains, test/test_package.py::TestDiGraph::test_contains_non_hashable, test/test_package.py::TestDiGraph::test_edges, test/test_package.py::TestDiGraph::test_forward_closure, test/test_package.py::TestDiGraph::test_iter, test/test_package.py::TestDiGraph::test_node_attr_update, test/test_package.py::TestDiGraph::test_node_attrs, test/test_package.py::TestDiGraph::test_predecessor_not_in_graph, test/test_package.py::TestDiGraph::test_predecessors, test/test_package.py::TestDiGraph::test_successor_not_in_graph, test/test_package.py::TestDiGraph::test_successors, test/test_package.py::DirectoryReaderTest::test_importer_access, test/test_package.py::DirectoryReaderTest::test_loading_has_record, test/test_package.py::DirectoryReaderTest::test_loading_module, test/test_package.py::DirectoryReaderTest::test_loading_pickle, test/test_package.py::DirectoryReaderTest::test_package_resource_access, test/test_package.py::DirectoryReaderTest::test_resource_access_by_path, test/test_package.py::DirectoryReaderTest::test_resource_reader, test/test_package.py::DirectoryReaderTest::test_scriptobject_failure_message, test/test_package.py::TestGlobGroup::test_exclude, test/test_package.py::TestGlobGroup::test_exclude_from_all, test/test_package.py::TestGlobGroup::test_invalid_raw, test/test_package.py::TestGlobGroup::test_list_include_exclude, test/test_package.py::TestGlobGroup::test_one_star, test/test_package.py::TestGlobGroup::test_one_star_middle, test/test_package.py::TestGlobGroup::test_one_star_multiple_in_component, test/test_package.py::TestGlobGroup::test_one_star_partial, test/test_package.py::TestGlobGroup::test_one_star_partial_extension, test/test_package.py::TestGlobGroup::test_raw_two_star, test/test_package.py::TestGlobGroup::test_two_star, test/test_package.py::TestGlobGroup::test_two_star_end, test/test_package.py::TestGlobGroup::test_two_star_middle, test/test_package.py::TestGlobGroup::test_two_star_multiple, test/test_package.py::TestImporter::test_ordered_importer_basic, test/test_package.py::TestImporter::test_ordered_importer_whichmodule, test/test_package.py::TestImporter::test_package_importer_whichmodule_no_dunder_module, test/test_package.py::TestImporter::test_single_ordered_importer, test/test_package.py::TestImporter::test_sys_importer, test/test_package.py::TestImporter::test_sys_importer_roundtrip, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_fx_module, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_nn_module, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_torchscript_module, test/test_package.py::TestMangling::test_demangle_base, test/test_package.py::TestMangling::test_demangler_multiple_manglers, test/test_package.py::TestMangling::test_is_mangled, test/test_package.py::TestMangling::test_mangle_empty_errors, test/test_package.py::TestMangling::test_mangle_prefix, test/test_package.py::TestMangling::test_mangler_is_consistent, test/test_package.py::TestMangling::test_package_mangler, test/test_package.py::TestMangling::test_roundtrip_mangling, test/test_package.py::TestMangling::test_unique_manglers, test/test_package.py::TestMangling::test_unique_module_names, test/test_package.py::TestMisc::test_dunder_package_present, test/test_package.py::TestMisc::test_dunder_package_works_from_package, test/test_package.py::TestMisc::test_exporter_content_lists, test/test_package.py::TestMisc::test_file_structure, test/test_package.py::TestMisc::test_file_structure_has_file, test/test_package.py::TestMisc::test_inspect_class, test/test_package.py::TestMisc::test_is_from_package, test/test_package.py::TestMisc::test_load_python_version_from_package, test/test_package.py::TestMisc::test_loaders_that_remap_files_work_ok, test/test_package.py::TestMisc::test_python_version, test/test_package.py::TestMisc::test_std_lib_sys_hackery_checks, test/test_package.py::ModelTest::test_model_save, test/test_package.py::ModelTest::test_resnet, test/test_package.py::ModelTest::test_script_resnet, test/test_package.py::TestPackageFX::test_package_fx_custom_tracer, test/test_package.py::TestPackageFX::test_package_fx_package, test/test_package.py::TestPackageFX::test_package_fx_simple, test/test_package.py::TestPackageFX::test_package_fx_with_imports, test/test_package.py::TestPackageFX::test_package_fx_wrap, test/test_package.py::TestPackageFX::test_package_gm_preserve_stack_trace, test/test_package.py::TestPackageFX::test_package_then_fx, test/test_package.py::TestPackageScript::test_different_package_interface, test/test_package.py::TestPackageScript::test_different_package_script_class, test/test_package.py::TestPackageScript::test_load_shared_scriptmodules, test/test_package.py::TestPackageScript::test_load_shared_tensors, test/test_package.py::TestPackageScript::test_load_shared_tensors_repackaged, test/test_package.py::TestPackageScript::test_mixing_packaged_and_inline_modules, test/test_package.py::TestPackageScript::test_mixing_packaged_and_inline_modules_shared_code, test/test_package.py::TestPackageScript::test_package_interface, test/test_package.py::TestPackageScript::test_package_script_class, test/test_package.py::TestPackageScript::test_package_script_class_referencing_self, test/test_package.py::TestPackageScript::test_save_eager_mods_sharing_scriptmodule, test/test_package.py::TestPackageScript::test_save_independent_scriptmodules, test/test_package.py::TestPackageScript::test_save_repeat_scriptmodules, test/test_package.py::TestPackageScript::test_save_scriptmodule, test/test_package.py::TestPackageScript::test_save_scriptmodule_file, test/test_package.py::TestPackageScript::test_save_scriptmodule_only_necessary_code, test/test_package.py::TestPackageScript::test_save_scriptmodule_with_submods, test/test_package.py::TestPackageScript::test_save_scriptmodules_in_container, test/test_package.py::TestPackageScript::test_save_scriptmodules_submod_redefinition, test/test_package.py::TestPackageScript::test_save_shared_tensors, test/test_package.py::TestPackageScript::test_saving_and_scripting_packaged_mod, test/test_package.py::TestPackageScript::test_scriptmodules_repeat_save, test/test_package.py::TestPackageScript::test_tensor_sharing_pickle, test/test_package.py::TestRepackage::test_repackage_import_indirectly_via_parent_module, test/test_package.py::TestResources::test_importer_access, test/test_package.py::TestResources::test_package_resource_access, test/test_package.py::TestResources::test_resource_access_by_path, test/test_package.py::TestResources::test_resource_reader, test/test_package.py::TestSaveLoad::test_bad_dunder_imports, test/test_package.py::TestSaveLoad::test_dunder_imports, test/test_package.py::TestSaveLoad::test_exporting_mismatched_code, test/test_package.py::TestSaveLoad::test_pickle, test/test_package.py::TestSaveLoad::test_pickle_long_name_with_protocol_4, test/test_package.py::TestSaveLoad::test_save_imported_module, test/test_package.py::TestSaveLoad::test_save_imported_module_using_package_importer, test/test_package.py::TestSaveLoad::test_save_load_fp8, test/test_package.py::TestSaveLoad::test_save_module, test/test_package.py::TestSaveLoad::test_save_module_binary, test/test_package.py::TestSaveLoad::test_saving_source, test/test_package.py::TestSaveLoad::test_saving_string 2025-12-04T15:19:05.3341345Z 2025-12-04T15:19:05.3341592Z Finished test_package 1/1 ... [2025-12-04 15:19:05.326433][22017.335656914], took 0.09min 2025-12-04T15:19:05.3486816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_package/test_package-a2f65f799bf50b4a.xml 2025-12-04T15:19:05.4000932Z Running test_mkl_verbose 1/1 ... [2025-12-04 15:19:05.399743][22017.408968345] 2025-12-04T15:19:05.4001368Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:05.4003735Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkl_verbose.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:05.400040] 2025-12-04T15:19:13.0296894Z 2025-12-04T15:19:13.0297774Z test_mkl_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkl_verbose_1.1_a8ab8be9a564b785_.log 2025-12-04T15:19:13.0299353Z Running 2 items in this shard: test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_off, test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_on 2025-12-04T15:19:13.0300086Z 2025-12-04T15:19:13.0300360Z Finished test_mkl_verbose 1/1 ... [2025-12-04 15:19:13.029397][22025.038622174], took 0.13min 2025-12-04T15:19:13.0516391Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-c19a0c4320bf6e65.xml 2025-12-04T15:19:13.1348489Z Running test_utils_config_module 1/1 ... [2025-12-04 15:19:13.134471][22025.143694431] 2025-12-04T15:19:13.1348977Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:13.1351880Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_utils_config_module.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:13.134795] 2025-12-04T15:19:16.8558248Z 2025-12-04T15:19:16.8559236Z test_utils_config_module 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_utils_config_module_1.1_aa22a3cb4155f80d_.log 2025-12-04T15:19:16.8567018Z Running 22 items in this shard: test/test_utils_config_module.py::TestConfigModule::test_alias, test/test_utils_config_module.py::TestConfigModule::test_bad_jk_type, test/test_utils_config_module.py::TestConfigModule::test_base_value_loading, test/test_utils_config_module.py::TestConfigModule::test_codegen_config, test/test_utils_config_module.py::TestConfigModule::test_codegen_config_function, test/test_utils_config_module.py::TestConfigModule::test_dict_copy_semantics, test/test_utils_config_module.py::TestConfigModule::test_env_name_semantics, test/test_utils_config_module.py::TestConfigModule::test_env_name_string_semantics, test/test_utils_config_module.py::TestConfigModule::test_get_hash, test/test_utils_config_module.py::TestConfigModule::test_invalid_config_float, test/test_utils_config_module.py::TestConfigModule::test_invalid_config_int, test/test_utils_config_module.py::TestConfigModule::test_make_closur_patcher, test/test_utils_config_module.py::TestConfigModule::test_multi_env, test/test_utils_config_module.py::TestConfigModule::test_none_override_semantics, test/test_utils_config_module.py::TestConfigModule::test_overrides, test/test_utils_config_module.py::TestConfigModule::test_patch, test/test_utils_config_module.py::TestConfigModule::test_reference_is_default, test/test_utils_config_module.py::TestConfigModule::test_reference_semantics, test/test_utils_config_module.py::TestConfigModule::test_save_config, test/test_utils_config_module.py::TestConfigModule::test_save_config_portable, test/test_utils_config_module.py::TestConfigModule::test_type_loading, test/test_utils_config_module.py::TestConfigModule::test_unittest_patch 2025-12-04T15:19:16.8574245Z 2025-12-04T15:19:16.8574535Z Finished test_utils_config_module 1/1 ... [2025-12-04 15:19:16.855501][22028.864725932], took 0.06min 2025-12-04T15:19:16.8778208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_utils_config_module/test_utils_config_module-cd73bdff208ab311.xml 2025-12-04T15:19:16.9111410Z Running test_hop_infra 1/1 ... [2025-12-04 15:19:16.910833][22028.920057255] 2025-12-04T15:19:16.9111857Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:16.9115066Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_hop_infra.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:16.911139] 2025-12-04T15:19:21.0329109Z 2025-12-04T15:19:21.0329908Z test_hop_infra 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hop_infra_1.1_f77bb32afa422f2e_.log 2025-12-04T15:19:21.0331569Z Running 3 items in this shard: test/test_hop_infra.py::TestHOPInfra::test_all_hops_are_imported, test/test_hop_infra.py::TestHOPInfra::test_all_hops_have_opinfo, test/test_hop_infra.py::TestHOPInfra::test_imports_from_all_work 2025-12-04T15:19:21.0332525Z 2025-12-04T15:19:21.0332775Z Finished test_hop_infra 1/1 ... [2025-12-04 15:19:21.032563][22033.041787652], took 0.07min 2025-12-04T15:19:21.0549096Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_hop_infra/test_hop_infra-d1efcb546b726ee3.xml 2025-12-04T15:19:21.0905091Z Running test_appending_byte_serializer 1/1 ... [2025-12-04 15:19:21.090173][22033.099397978] 2025-12-04T15:19:21.0905624Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:21.0908361Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_appending_byte_serializer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:21.090474] 2025-12-04T15:19:24.7623793Z 2025-12-04T15:19:24.7625040Z test_appending_byte_serializer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_appending_byte_serializer_1.1_7e52ee648e02aa85_.log 2025-12-04T15:19:24.7627068Z Running 3 items in this shard: test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_checksum, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_class, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_int 2025-12-04T15:19:24.7628471Z 2025-12-04T15:19:24.7628824Z Finished test_appending_byte_serializer 1/1 ... [2025-12-04 15:19:24.761916][22036.771140244], took 0.06min 2025-12-04T15:19:24.7845629Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_appending_byte_serializer/test_appending_byte_serializer-db1af3fc87bd6240.xml 2025-12-04T15:19:24.8253623Z Running test_ao_sparsity 1/1 ... [2025-12-04 15:19:24.824995][22036.834220217] 2025-12-04T15:19:24.8254227Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:24.8256861Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ao_sparsity.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:24.825301] 2025-12-04T15:19:36.9104543Z 2025-12-04T15:19:36.9105532Z test_ao_sparsity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_ao_sparsity_1.1_c127cba34d71d100_.log 2025-12-04T15:19:36.9134779Z Running 88 items in this shard: test/test_ao_sparsity.py::TestQuantizedSparseKernels::test_sparse_qlinear, test/test_ao_sparsity.py::TestQuantizedSparseLayers::test_sparse_qlinear, test/test_ao_sparsity.py::TestQuantizedSparseLayers::test_sparse_qlinear_serdes, test/test_ao_sparsity.py::TestFakeSparsity::test_jit_trace, test/test_ao_sparsity.py::TestFakeSparsity::test_masking_logic, test/test_ao_sparsity.py::TestFakeSparsity::test_state_dict_preserved, test/test_ao_sparsity.py::TestFakeSparsity::test_weights_parametrized, test/test_ao_sparsity.py::TestCubicScheduler::test_constructor, test/test_ao_sparsity.py::TestCubicScheduler::test_step, test/test_ao_sparsity.py::TestScheduler::test_constructor, test/test_ao_sparsity.py::TestScheduler::test_lambda_scheduler, test/test_ao_sparsity.py::TestScheduler::test_order_of_steps, test/test_ao_sparsity.py::TestScheduler::test_step, test/test_ao_sparsity.py::TestBaseSparsifier::test_constructor, test/test_ao_sparsity.py::TestBaseSparsifier::test_convert, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params1, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params2, test/test_ao_sparsity.py::TestBaseSparsifier::test_mask_squash_with_params3, test/test_ao_sparsity.py::TestBaseSparsifier::test_prepare_config, test/test_ao_sparsity.py::TestBaseSparsifier::test_state_dict, test/test_ao_sparsity.py::TestBaseSparsifier::test_step, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_constructor, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_prepare, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_sparsity_levels, test/test_ao_sparsity.py::TestNearlyDiagonalSparsifier::test_step, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_constructor, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_mask_squash, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_prepare, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_sparsity_levels, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_step, test/test_ao_sparsity.py::TestWeightNormSparsifier::test_step_2_of_4, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_complex_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_constructor, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prepare_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prepare_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_activation_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_bias_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_padding_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_conv2d_pool_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_activation_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_bias_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_linear_linear, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_layernorm_linear_multiple_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_layernorm_linear_single_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_linear_multiple_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_prune_lstm_linear_single_layer, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_step_conv2d, test/test_ao_sparsity.py::TestBaseStructuredSparsifier::test_step_linear, test/test_ao_sparsity.py::TestFPGMPruner::test_compute_distance, test/test_ao_sparsity.py::TestFPGMPruner::test_update_mask, test/test_ao_sparsity.py::TestSaliencyPruner::test_lstm_saliency_pruner_update_mask, test/test_ao_sparsity.py::TestSaliencyPruner::test_saliency_pruner_update_mask, test/test_ao_sparsity.py::TestComposability::test_convert_without_squash_mask, test/test_ao_sparsity.py::TestComposability::test_fusion_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_q_prep_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_qat_prep_before_s_prep, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_fusion, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_q_prep, test/test_ao_sparsity.py::TestComposability::test_s_prep_before_qat_prep, test/test_ao_sparsity.py::TestFxComposability::test_q_prep_fx_before_s_prep, test/test_ao_sparsity.py::TestFxComposability::test_q_prep_fx_s_prep_ref_conv, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_before_q_prep_fx, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_before_qat_prep_fx, test/test_ao_sparsity.py::TestFxComposability::test_s_prep_q_prep_fx_ref, test/test_ao_sparsity.py::TestActivationSparsifier::test_activation_sparsifier, test/test_ao_sparsity.py::TestBaseDataScheduler::test_constructor, test/test_ao_sparsity.py::TestBaseDataScheduler::test_order_of_steps, test/test_ao_sparsity.py::TestBaseDataScheduler::test_state_dict, test/test_ao_sparsity.py::TestBaseDataScheduler::test_step, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_nn_embeddings, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_nn_parameters, test/test_ao_sparsity.py::TestBaseDataSparsifier::test_tensors, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_nn_embeddings, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_nn_parameters, test/test_ao_sparsity.py::TestNormDataSparsifiers::test_tensors, test/test_ao_sparsity.py::TestQuantizationUtils::test_ptq_quantize_first, test/test_ao_sparsity.py::TestQuantizationUtils::test_ptq_sparsify_first, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_fqn_to_module_for_tensors, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_get_arg_info_from_tensor_fqn, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_get_arg_info_from_tensor_fqn_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn_fail, test/test_ao_sparsity.py::TestSparsityUtilFunctions::test_module_to_fqn_root 2025-12-04T15:19:36.9162986Z 2025-12-04T15:19:36.9163242Z Finished test_ao_sparsity 1/1 ... [2025-12-04 15:19:36.910131][22048.919355796], took 0.20min 2025-12-04T15:19:36.9332262Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ao_sparsity/test_ao_sparsity-47b60e8cb29a5ef6.xml 2025-12-04T15:19:37.0115810Z Running test_extension_utils 1/1 ... [2025-12-04 15:19:37.011248][22049.020471037] 2025-12-04T15:19:37.0116275Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:37.0119391Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_extension_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:37.011554] 2025-12-04T15:19:40.6324735Z 2025-12-04T15:19:40.6325572Z test_extension_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_extension_utils_1.1_7f66e708b7c7a8bc_.log 2025-12-04T15:19:40.6327057Z Running 2 items in this shard: test/test_extension_utils.py::TestExtensionUtils::test_external_module_register, test/test_extension_utils.py::TestExtensionUtils::test_external_module_register_with_renamed_backend 2025-12-04T15:19:40.6328003Z 2025-12-04T15:19:40.6328278Z Finished test_extension_utils 1/1 ... [2025-12-04 15:19:40.632156][22052.641379694], took 0.06min 2025-12-04T15:19:40.6564522Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_extension_utils/test_extension_utils-5e3baa267a09a3bb.xml 2025-12-04T15:19:40.6866587Z Running nn/attention/test_fa4 1/1 ... [2025-12-04 15:19:40.686191][22052.69541583] 2025-12-04T15:19:40.6867042Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:40.6868509Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/attention/test_fa4.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:40.686492] 2025-12-04T15:19:44.6091035Z 2025-12-04T15:19:44.6091900Z nn/attention/test_fa4 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.attention.test_fa4_1.1_59632c9893caec1b_.log 2025-12-04T15:19:44.6139879Z Running 66 items in this shard: test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_fa4_kernel_called_bfloat16_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_fa4_kernel_called_float16_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_4_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_4_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_4_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_4_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_8_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_8_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_8_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_1024_heads_8_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_4_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_4_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_4_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_4_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_8_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_8_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_8_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_1_seq_len_512_heads_8_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_4_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_4_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_4_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_4_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_8_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_8_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_8_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_1024_heads_8_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_4_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_4_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_4_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_4_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_8_head_dim_128_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_8_head_dim_128_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_8_head_dim_64_is_causal_False_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_bfloat16_batch_2_seq_len_512_heads_8_head_dim_64_is_causal_True_cuda_bfloat16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_4_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_4_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_4_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_4_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_8_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_8_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_8_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_1024_heads_8_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_4_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_4_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_4_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_4_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_8_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_8_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_8_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_1_seq_len_512_heads_8_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_4_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_4_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_4_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_4_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_8_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_8_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_8_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_1024_heads_8_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_4_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_4_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_4_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_4_head_dim_64_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_8_head_dim_128_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_8_head_dim_128_is_causal_True_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_8_head_dim_64_is_causal_False_cuda_float16, test/nn/attention/test_fa4.py::TestFlashAttentionFA4CUDA::test_flash_attention_matches_math_float16_batch_2_seq_len_512_heads_8_head_dim_64_is_causal_True_cuda_float16 2025-12-04T15:19:44.6185877Z 2025-12-04T15:19:44.6186188Z Finished nn/attention/test_fa4 1/1 ... [2025-12-04 15:19:44.608688][22056.617911862], took 0.07min 2025-12-04T15:19:44.6332598Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.attention.test_fa4/nn.attention.test_fa4-2d55ad78ccee943a.xml 2025-12-04T15:19:44.6688000Z Running typing/test_python_operators 1/1 ... [2025-12-04 15:19:44.668377][22056.677602487] 2025-12-04T15:19:44.6688510Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:44.6691123Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'typing/test_python_operators.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:44.668719] 2025-12-04T15:19:48.9418842Z 2025-12-04T15:19:48.9419981Z typing/test_python_operators 1/1 was successful, full logs can be found in artifacts with path test/test-reports/typing.test_python_operators_1.1_1dbf7db937cf8b4b_.log 2025-12-04T15:19:48.9534707Z Running 318 items in this shard: test/typing/test_python_operators.py::TestPythonOperators::test_binary_a100_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a101_op_%_b101, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a102_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a103_op_%_b103, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a104_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a105_op_*_b105, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a106_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a107_op_*_b107, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a108_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a109_op_**_b109, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a110_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a111_op_**_b111, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a112_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a113_op_+_b113, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a114_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a115_op_+_b115, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a116_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a117_op_-_b117, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a118_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a119_op_-_b119, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a120_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a121_op_/_b121, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a122_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a123_op_/_b123, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a124_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a125_op_//_b125, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a126_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a127_op_//_b127, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a128_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a129_op_&_b129, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a130_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a131_op_&_b131, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a132_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a133_op_<<_b133, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a134_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a135_op_<<_b135, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a136_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a137_op_>>_b137, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a138_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a139_op_>>_b139, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a140_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a141_op_^_b141, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a142_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a143_op_^_b143, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a144_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a145_op_|_b145, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a146_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a147_op_|_b147, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a148_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a149_op_@_b149, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a150_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a151_op_@_b151, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a228_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a229_op_!=_b229, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a230_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a231_op_!=_b231, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a232_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a233_op_<_b233, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a234_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a235_op_<_b235, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a236_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a237_op_<=_b237, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a238_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a239_op_<=_b239, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a240_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a241_op_==_b241, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a242_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a243_op_==_b243, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a244_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a245_op_>_b245, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a246_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a247_op_>_b247, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a248_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a249_op_>=_b249, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a250_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a251_op_>=_b251, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a252_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a253_op_%_b253, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a254_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a255_op_%_b255, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a256_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a257_op_*_b257, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a258_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a259_op_*_b259, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a260_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a261_op_**_b261, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a262_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a263_op_**_b263, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a264_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a265_op_+_b265, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a266_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a267_op_+_b267, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a268_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a269_op_-_b269, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a270_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a271_op_-_b271, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a272_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a273_op_/_b273, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a274_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a275_op_/_b275, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a276_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a277_op_//_b277, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a278_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a279_op_//_b279, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a280_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a281_op_&_b281, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a282_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a283_op_&_b283, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a284_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a285_op_<<_b285, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a286_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a287_op_<<_b287, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a288_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a289_op_>>_b289, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a290_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a291_op_>>_b291, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a292_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a293_op_^_b293, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a294_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a295_op_^_b295, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a296_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a297_op_|_b297, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a298_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a299_op_|_b299, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a300_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a301_op_@_b301, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a302_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a303_op_@_b303, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a76_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a77_op_!=_b77, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a78_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a79_op_!=_b79, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a80_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a81_op_<_b81, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a82_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a83_op_<_b83, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a84_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a85_op_<=_b85, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a86_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a87_op_<=_b87, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a88_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a89_op_==_b89, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a90_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a91_op_==_b91, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a92_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a93_op_>_b93, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a94_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a95_op_>_b95, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a96_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a97_op_>=_b97, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a98_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a99_op_>=_b99, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b1, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b25, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b27, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b53, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b55, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b33, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b35, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b29, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b31, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b37, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b39, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b41, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b43, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b49, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b51, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b45, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b47, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b57, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b59, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b11, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b9, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b7, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b13, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b15, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b21, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b23, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b61, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b63, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b17, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b19, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b73, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b75, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b65, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b67, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b69, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b71, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b153, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b155, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b177, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b179, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b205, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b207, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b185, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b187, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b181, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b183, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b189, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b191, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b193, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b195, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b201, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b203, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b197, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b199, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b209, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b211, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b161, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b163, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b157, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b159, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b165, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b167, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b173, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b175, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b213, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b215, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b169, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b171, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b225, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b227, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b217, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b219, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b221, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b223, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_operators_are_correct_and_complete, test/typing/test_python_operators.py::TestPythonOperators::test_type_tests_are_complete, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a1, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a_3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a7, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a_3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a11, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a9, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a_3 2025-12-04T15:19:48.9647746Z 2025-12-04T15:19:48.9648174Z Finished typing/test_python_operators 1/1 ... [2025-12-04 15:19:48.941895][22060.951117554], took 0.07min 2025-12-04T15:19:48.9651066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/typing.test_python_operators/typing.test_python_operators-7b01e9f4c56696ce.xml 2025-12-04T15:19:49.0020205Z Running torch_np/test_dtype 1/1 ... [2025-12-04 15:19:49.001572][22061.010797149] 2025-12-04T15:19:49.0020893Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:49.0022183Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_dtype.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:49.001875] 2025-12-04T15:19:52.9244807Z 2025-12-04T15:19:52.9245938Z torch_np/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_dtype_1.1_8ba7a24ba508317e_.log 2025-12-04T15:19:52.9268566Z Running 44 items in this shard: test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'bool_', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'complex128', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'complex64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_bool, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'bool_', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'complex128', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'complex64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.bool_, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.complex128, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.complex64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.dtype('bool'), test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int8, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint8 2025-12-04T15:19:52.9299071Z 2025-12-04T15:19:52.9299479Z Finished torch_np/test_dtype 1/1 ... [2025-12-04 15:19:52.924168][22064.933392502], took 0.07min 2025-12-04T15:19:52.9501036Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_dtype/torch_np.test_dtype-50c590a3e827391c.xml 2025-12-04T15:19:53.0386002Z Running test_file_check 1/1 ... [2025-12-04 15:19:53.038230][22065.04745425] 2025-12-04T15:19:53.0386632Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:53.0389380Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_file_check.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:53.038545] 2025-12-04T15:19:58.3632107Z 2025-12-04T15:19:58.3633128Z test_file_check 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_file_check_1.1_e6044214ffdb04bb_.log 2025-12-04T15:19:58.3634244Z Running 2 items in this shard: test/test_file_check.py::TestFileCheck::test_all_python_api, test/test_file_check.py::TestFileCheck::test_not_run 2025-12-04T15:19:58.3636012Z 2025-12-04T15:19:58.3636270Z Finished test_file_check 1/1 ... [2025-12-04 15:19:58.362811][22070.372035669], took 0.09min 2025-12-04T15:19:58.3866933Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_file_check/test_file_check-c5f916d4f839abe2.xml 2025-12-04T15:19:58.4181199Z Running profiler/test_kineto 1/1 ... [2025-12-04 15:19:58.417733][22070.426957839] 2025-12-04T15:19:58.4182220Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:19:58.4184087Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_kineto.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:19:58.418042] 2025-12-04T15:20:13.5105352Z 2025-12-04T15:20:13.5106955Z profiler/test_kineto 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_kineto_1.1_3901a608b259f0c8_.log 2025-12-04T15:20:13.5109156Z Running 1 items in this shard: test/profiler/test_kineto.py::SimpleKinetoInitializationTest::test_kineto_profiler_with_environment_variable 2025-12-04T15:20:13.5110170Z 2025-12-04T15:20:13.5110627Z Finished profiler/test_kineto 1/1 ... [2025-12-04 15:20:13.510191][22085.519416141], took 0.25min 2025-12-04T15:20:13.5342244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_kineto/profiler.test_kineto-1437f02ea71dbd19.xml 2025-12-04T15:20:13.6772492Z Running functorch/test_ac_knapsack 1/1 ... [2025-12-04 15:20:13.676840][22085.686063825] 2025-12-04T15:20:13.6773263Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:20:13.6777381Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac_knapsack.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:13.677164] 2025-12-04T15:20:17.5493759Z 2025-12-04T15:20:17.5494843Z functorch/test_ac_knapsack 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_knapsack_1.1_a4a52ea27bf21bce_.log 2025-12-04T15:20:17.5503593Z Running 17 items in this shard: test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_full_joint_nx_graph, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_get_knapsack_memory_input, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_get_knapsack_runtime_input, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_get_non_ac_peak_memory, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_get_theoretical_max_runtime, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_inialize_from_graph, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_recomputable_node_only_graph, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_recomputable_node_only_graph_with_larger_graph_context, test/functorch/test_ac_knapsack.py::TestGraphInfoProvider::test_simplified_fx_joint_graph, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_evaluate_distribution_of_results_for_knapsack_algo, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_evaluate_knapsack_output_accounting_for_backward_pass, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_evaluate_knapsack_output_not_accounting_for_backward_pass, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_evaluate_knapsack_output_with_wrong_sized_values, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_get_backward_memory_from_topologically_sorted_graph, test/functorch/test_ac_knapsack.py::TestKnapsackEvaluator::test_get_knee_point_memory_budget, test/functorch/test_ac_knapsack.py::TestActivationCheckpointingKnapsack::test_dp_knapsack, test/functorch/test_ac_knapsack.py::TestActivationCheckpointingKnapsack::test_dp_knapsack_sliding_hirschberg 2025-12-04T15:20:17.5511920Z 2025-12-04T15:20:17.5512333Z Finished functorch/test_ac_knapsack 1/1 ... [2025-12-04 15:20:17.549011][22089.558235709], took 0.06min 2025-12-04T15:20:17.5727489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ac_knapsack/functorch.test_ac_knapsack-a2f3dae1f99bc885.xml 2025-12-04T15:20:17.6198327Z Running torch_np/test_nep50_examples 1/1 ... [2025-12-04 15:20:17.619460][22089.628685653] 2025-12-04T15:20:17.6198830Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:20:17.6201558Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_nep50_examples.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:17.619755] 2025-12-04T15:20:22.8435000Z 2025-12-04T15:20:22.8436097Z torch_np/test_nep50_examples 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_nep50_examples_1.1_be93e5fc5572125c_.log 2025-12-04T15:20:22.9185140Z Running 1573 items in this shard: test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_3j + array(3, complex64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_True + uint8(2), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array(1_0, float32) + 1e-14 == 1_0, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([0_1], float32) == float64(0_1), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([100], uint8) + 200, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 1, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 200, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + 300, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + array(1, int64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1], uint8) + int64(1), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_0], float32) + 1e-14 == 1_0, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + 3, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + array(1_, float64), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + float64(1_), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_array([1_], float32) + int64(3), test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_bool_(True) + 1, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(1) + 1j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(1) + 3e100, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_float32(5) + 5j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int16(2) + 2, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int16(4) + 4j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_int32(1) + 5j, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(1) + 2, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(1) + 300, test/torch_np/test_nep50_examples.py::TestNEP50Table::test_nep50_exceptions_example_uint8(100) + 200, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_add_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_arctan2_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_and_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_or_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_bitwise_xor_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_copysign_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_divmod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_float_power_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_floor_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmax_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmin_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_fmod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_gcd_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_greater_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_heaviside_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_hypot_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_lcm_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_ldexp_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_left_shift_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_less_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp2_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logaddexp_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_and_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_or_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_logical_xor_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_matmul_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_maximum_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_minimum_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_mod_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_modf_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_multiply_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_nextafter_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_not_equal_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_power_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_remainder_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_right_shift_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_subtract_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar27_array27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar28_array28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar29_array29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar30_array30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar31_array31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar32_array32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar33_array33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar34_array34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar35_array35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_1_array9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_2_0_array26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_compare_ufuncs_name_true_divide_scalar_True_array8, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar27_array27_dtype27, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar28_array28_dtype28, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar29_array29_dtype29, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar30_array30_dtype30, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar31_array31_dtype31, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar32_array32_dtype32, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar33_array33_dtype33, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar34_array34_dtype34, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar35_array35_dtype35, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array10_dtype10, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array11_dtype11, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array12_dtype12, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array13_dtype13, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array14_dtype14, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array15_dtype15, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array16_dtype16, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array17_dtype17, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_1_array9_dtype9, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array18_dtype18, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array19_dtype19, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array20_dtype20, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array21_dtype21, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array22_dtype22, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array23_dtype23, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array24_dtype24, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array25_dtype25, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_2_0_array26_dtype26, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array0_dtype0, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array1_dtype1, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array2_dtype2, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array3_dtype3, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array4_dtype4, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array5_dtype5, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array6_dtype6, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array7_dtype7, test/torch_np/test_nep50_examples.py::TestCompareToNumpy::test_direct_compare_scalar_True_array8_dtype8 2025-12-04T15:20:22.9909423Z 2025-12-04T15:20:22.9909750Z Finished torch_np/test_nep50_examples 1/1 ... [2025-12-04 15:20:22.846271][22094.855494381], took 0.09min 2025-12-04T15:20:22.9910888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_nep50_examples/torch_np.test_nep50_examples-87e42828c2fde829.xml 2025-12-04T15:20:22.9911888Z Running test_torch 1/1 ... [2025-12-04 15:20:22.955177][22094.964399308] 2025-12-04T15:20:22.9912291Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:20:22.9913269Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_torch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:22.955502] 2025-12-04T15:21:47.2122406Z 2025-12-04T15:21:47.2123267Z test_torch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_torch_1.1_ed3627b67cdc077e_.log 2025-12-04T15:21:47.2449776Z Running 976 items in this shard: test/test_torch.py::TestBasicVitalSigns::test_basic_vitals, test/test_torch.py::TestBasicVitalSigns::test_basic_vitals_read_write, test/test_torch.py::TestBasicVitalSigns::test_dataloader_vitals, test/test_torch.py::TestTorch::test_RNGState, test/test_torch.py::TestTorch::test_RNGStateAliasing, test/test_torch.py::TestTorch::test_RNG_after_pickle, test/test_torch.py::TestTorch::test_Size, test/test_torch.py::TestTorch::test_Size_concat_non_tuple_sequence, test/test_torch.py::TestTorch::test_Size_concat_wildcard, test/test_torch.py::TestTorch::test_Size_iter, test/test_torch.py::TestTorch::test_Size_scalar, test/test_torch.py::TestTorch::test_add_meta_scalar, test/test_torch.py::TestTorch::test_allow_tensor_metadata_change, test/test_torch.py::TestTorch::test_apply, test/test_torch.py::TestTorch::test_as_subclass, test/test_torch.py::TestTorch::test_assert_async, test/test_torch.py::TestTorch::test_backward_hooks_traverse, test/test_torch.py::TestTorch::test_batch_norm_cpu_inference, test/test_torch.py::TestTorch::test_bf16_supported_on_cpu, test/test_torch.py::TestTorch::test_bmm_multithreaded, test/test_torch.py::TestTorch::test_boxMullerState, test/test_torch.py::TestTorch::test_cat_neg_dim, test/test_torch.py::TestTorch::test_check, test/test_torch.py::TestTorch::test_chunk_neg_dim, test/test_torch.py::TestTorch::test_conj_neg_tolist, test/test_torch.py::TestTorch::test_conj_physical_meta_stride, test/test_torch.py::TestTorch::test_contains, test/test_torch.py::TestTorch::test_copy_broadcast, test/test_torch.py::TestTorch::test_copy_dtypes, test/test_torch.py::TestTorch::test_copy_float16, test/test_torch.py::TestTorch::test_copy_many_to_one, test/test_torch.py::TestTorch::test_copy_transpose, test/test_torch.py::TestTorch::test_cuda_not_built, test/test_torch.py::TestTorch::test_cummax_neg_dim, test/test_torch.py::TestTorch::test_cummin_neg_dim, test/test_torch.py::TestTorch::test_cumprod_neg_dim, test/test_torch.py::TestTorch::test_cumsum_neg_dim, test/test_torch.py::TestTorch::test_cxx_flags, test/test_torch.py::TestTorch::test_data_ptr_of_empty_tensor_with_storage, test/test_torch.py::TestTorch::test_data_ptr_of_empty_view_with_storage, test/test_torch.py::TestTorch::test_deepcopy_gradient, test/test_torch.py::TestTorch::test_deepcopy_parameter, test/test_torch.py::TestTorch::test_deterministic_fill_uninitialized_memory, test/test_torch.py::TestTorch::test_deterministic_flag, test/test_torch.py::TestTorch::test_device, test/test_torch.py::TestTorch::test_dim_order, test/test_torch.py::TestTorch::test_dir, test/test_torch.py::TestTorch::test_doc, test/test_torch.py::TestTorch::test_doc_template, test/test_torch.py::TestTorch::test_dot_data_use, test/test_torch.py::TestTorch::test_dtype_is_signed, test/test_torch.py::TestTorch::test_element_size, test/test_torch.py::TestTorch::test_empty_meta, test/test_torch.py::TestTorch::test_empty_storage_view, test/test_torch.py::TestTorch::test_equal, test/test_torch.py::TestTorch::test_error_msg_type_translation, test/test_torch.py::TestTorch::test_fill_diagonal, test/test_torch.py::TestTorch::test_format_scalar_meta, test/test_torch.py::TestTorch::test_from_buffer, test/test_torch.py::TestTorch::test_from_file, test/test_torch.py::TestTorch::test_gather_neg_dim, test/test_torch.py::TestTorch::test_generator_cpu, test/test_torch.py::TestTorch::test_get_cpu_capability, test/test_torch.py::TestTorch::test_has_internal_overlap, test/test_torch.py::TestTorch::test_has_storage, test/test_torch.py::TestTorch::test_index_add, test/test_torch.py::TestTorch::test_index_add_all_dtypes, test/test_torch.py::TestTorch::test_index_add_cornercase, test/test_torch.py::TestTorch::test_index_add_correctness, test/test_torch.py::TestTorch::test_index_add_neg_dim, test/test_torch.py::TestTorch::test_index_copy_neg_dim, test/test_torch.py::TestTorch::test_index_fill_neg_dim, test/test_torch.py::TestTorch::test_index_select_neg_dim, test/test_torch.py::TestTorch::test_invalid_arg_error_handling, test/test_torch.py::TestTorch::test_invalid_generator_raises, test/test_torch.py::TestTorch::test_is_nonzero, test/test_torch.py::TestTorch::test_is_same_size, test/test_torch.py::TestTorch::test_iter, test/test_torch.py::TestTorch::test_kthvalue_neg_dim, test/test_torch.py::TestTorch::test_linspace_logspace, test/test_torch.py::TestTorch::test_logcumsumexp_neg_dim, test/test_torch.py::TestTorch::test_manual_seed, test/test_torch.py::TestTorch::test_map, test/test_torch.py::TestTorch::test_map2, test/test_torch.py::TestTorch::test_max_neg_dim, test/test_torch.py::TestTorch::test_mean_neg_dim, test/test_torch.py::TestTorch::test_median_neg_dim, test/test_torch.py::TestTorch::test_memory_format, test/test_torch.py::TestTorch::test_memory_format_contiguous_returns_same_tensor_if_already_satisfies, test/test_torch.py::TestTorch::test_memory_format_empty, test/test_torch.py::TestTorch::test_min_neg_dim, test/test_torch.py::TestTorch::test_mode_neg_dim, test/test_torch.py::TestTorch::test_multinomial_invalid_probs, test/test_torch.py::TestTorch::test_nanmedian_neg_dim, test/test_torch.py::TestTorch::test_narrow_neg_dim, test/test_torch.py::TestTorch::test_nbytes, test/test_torch.py::TestTorch::test_ndim, test/test_torch.py::TestTorch::test_new, test/test_torch.py::TestTorch::test_newaxis_numpy_comparison, test/test_torch.py::TestTorch::test_newindex, test/test_torch.py::TestTorch::test_no_cuda_monkeypatch, test/test_torch.py::TestTorch::test_norm_neg_dim, test/test_torch.py::TestTorch::test_normal_shape, test/test_torch.py::TestTorch::test_numel, test/test_torch.py::TestTorch::test_parallel_info, test/test_torch.py::TestTorch::test_parsing_double, test/test_torch.py::TestTorch::test_parsing_int64, test/test_torch.py::TestTorch::test_parsing_intlist, test/test_torch.py::TestTorch::test_permute, test/test_torch.py::TestTorch::test_pickle, test/test_torch.py::TestTorch::test_pickle_dtype, test/test_torch.py::TestTorch::test_pickle_function, test/test_torch.py::TestTorch::test_pickle_generator, test/test_torch.py::TestTorch::test_pickle_parameter, test/test_torch.py::TestTorch::test_pickle_parameter_no_requires_grad, test/test_torch.py::TestTorch::test_pickle_size, test/test_torch.py::TestTorch::test_pin_memory, test/test_torch.py::TestTorch::test_print, test/test_torch.py::TestTorch::test_prod_neg_dim, test/test_torch.py::TestTorch::test_pyobj_preserved, test/test_torch.py::TestTorch::test_qengine, test/test_torch.py::TestTorch::test_renorm_neg_dim, test/test_torch.py::TestTorch::test_resizable, test/test_torch.py::TestTorch::test_reversed, test/test_torch.py::TestTorch::test_scatter_neg_dim, test/test_torch.py::TestTorch::test_select_neg_dim, test/test_torch.py::TestTorch::test_set_flush_denormal, test/test_torch.py::TestTorch::test_setting_real_imag_to_a_number, test/test_torch.py::TestTorch::test_show_config, test/test_torch.py::TestTorch::test_size_neg_dim, test/test_torch.py::TestTorch::test_size_stride, test/test_torch.py::TestTorch::test_sizeof, test/test_torch.py::TestTorch::test_slice, test/test_torch.py::TestTorch::test_slow_test, test/test_torch.py::TestTorch::test_sobolengine_bounds, test/test_torch.py::TestTorch::test_sobolengine_bounds_scrambled, test/test_torch.py::TestTorch::test_sobolengine_continuing, test/test_torch.py::TestTorch::test_sobolengine_continuing_scrambled, test/test_torch.py::TestTorch::test_sobolengine_default_dtype, test/test_torch.py::TestTorch::test_sobolengine_distribution, test/test_torch.py::TestTorch::test_sobolengine_distribution_scrambled, test/test_torch.py::TestTorch::test_sobolengine_draw, test/test_torch.py::TestTorch::test_sobolengine_draw_base2, test/test_torch.py::TestTorch::test_sobolengine_draw_base2_scrambled, test/test_torch.py::TestTorch::test_sobolengine_draw_scrambled, test/test_torch.py::TestTorch::test_sobolengine_fast_forward, test/test_torch.py::TestTorch::test_sobolengine_fast_forward_scrambled, test/test_torch.py::TestTorch::test_sobolengine_first_point, test/test_torch.py::TestTorch::test_sobolengine_high_dim, test/test_torch.py::TestTorch::test_sobolengine_raise, test/test_torch.py::TestTorch::test_sobolengine_reset, test/test_torch.py::TestTorch::test_sobolengine_reset_scrambled, test/test_torch.py::TestTorch::test_sort_neg_dim, test/test_torch.py::TestTorch::test_split_neg_dim, test/test_torch.py::TestTorch::test_split_with_sizes_copy_out, test/test_torch.py::TestTorch::test_squeeze_neg_dim, test/test_torch.py::TestTorch::test_std_neg_dim, test/test_torch.py::TestTorch::test_storage_base_init, test/test_torch.py::TestTorch::test_storage_base_new, test/test_torch.py::TestTorch::test_storage_byteswap, test/test_torch.py::TestTorch::test_storage_casts, test/test_torch.py::TestTorch::test_storage_cycle_via_dict, test/test_torch.py::TestTorch::test_storage_cycle_via_slots, test/test_torch.py::TestTorch::test_storage_dead_weak_ref, test/test_torch.py::TestTorch::test_storage_dealloc, test/test_torch.py::TestTorch::test_storage_dealloc_resurrected, test/test_torch.py::TestTorch::test_storage_dealloc_subclass_resurrected, test/test_torch.py::TestTorch::test_storage_dealloc_subclass_zombie, test/test_torch.py::TestTorch::test_storage_dict_dealloc, test/test_torch.py::TestTorch::test_storage_error, test/test_torch.py::TestTorch::test_storage_error_no_attribute, test/test_torch.py::TestTorch::test_storage_finalizer_dealloc, test/test_torch.py::TestTorch::test_storage_fix_weakref_no_leak, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc_resurrected, test/test_torch.py::TestTorch::test_storage_from_tensor_dealloc_zombie, test/test_torch.py::TestTorch::test_storage_preserve_nonhermetic_in_hermetic_context, test/test_torch.py::TestTorch::test_storage_resurrected_weak_ref, test/test_torch.py::TestTorch::test_storage_slot_dealloc, test/test_torch.py::TestTorch::test_storage_thread_safety, test/test_torch.py::TestTorch::test_storage_weakref_dealloc, test/test_torch.py::TestTorch::test_structseq_repr, test/test_torch.py::TestTorch::test_subclass_preserved, test/test_torch.py::TestTorch::test_subclass_tensors, test/test_torch.py::TestTorch::test_sum_neg_dim, test/test_torch.py::TestTorch::test_swap_basic, test/test_torch.py::TestTorch::test_swap_fail_slots, test/test_torch.py::TestTorch::test_t_not_2d_error, test/test_torch.py::TestTorch::test_tensor_base_init, test/test_torch.py::TestTorch::test_tensor_base_new, test/test_torch.py::TestTorch::test_tensor_ctor_scalar, test/test_torch.py::TestTorch::test_tensor_cycle_via_dict, test/test_torch.py::TestTorch::test_tensor_cycle_via_slots, test/test_torch.py::TestTorch::test_tensor_dead_weak_ref, test/test_torch.py::TestTorch::test_tensor_dict_dealloc, test/test_torch.py::TestTorch::test_tensor_finalizer_dealloc, test/test_torch.py::TestTorch::test_tensor_fix_weakref_no_leak, test/test_torch.py::TestTorch::test_tensor_item_no_warning, test/test_torch.py::TestTorch::test_tensor_ressurecting_clear, test/test_torch.py::TestTorch::test_tensor_resurrected_weak_ref, test/test_torch.py::TestTorch::test_tensor_set, test/test_torch.py::TestTorch::test_tensor_set_errors, test/test_torch.py::TestTorch::test_tensor_slot_dealloc, test/test_torch.py::TestTorch::test_tensor_weakref_dealloc, test/test_torch.py::TestTorch::test_tensor_where_scalar, test/test_torch.py::TestTorch::test_tensor_with_grad_to_scalar_warning, test/test_torch.py::TestTorch::test_tensoriterator_output_setup, test/test_torch.py::TestTorch::test_terminate_handler_on_crash, test/test_torch.py::TestTorch::test_to, test/test_torch.py::TestTorch::test_to_with_tensor, test/test_torch.py::TestTorch::test_topk_neg_dim, test/test_torch.py::TestTorch::test_torch_from_file, test/test_torch.py::TestTorch::test_transpose_neg_dim, test/test_torch.py::TestTorch::test_type, test/test_torch.py::TestTorch::test_type_alias, test/test_torch.py::TestTorch::test_type_conversion_via_dtype_name, test/test_torch.py::TestTorch::test_typed_storage_deprecation_warning, test/test_torch.py::TestTorch::test_typed_storage_internal_no_warning, test/test_torch.py::TestTorch::test_unbind_neg_dim, test/test_torch.py::TestTorch::test_unflatten, test/test_torch.py::TestTorch::test_unfold_neg_dim, test/test_torch.py::TestTorch::test_unsqueeze_neg_dim, test/test_torch.py::TestTorch::test_upsample_nearest1d_meta, test/test_torch.py::TestTorch::test_upsample_nearest2d_meta, test/test_torch.py::TestTorch::test_var_neg_dim, test/test_torch.py::TestTorch::test_warn_types, test/test_torch.py::TestTorch::test_wildcard_import, test/test_torch.py::TestVitalSignsCudaCUDA::test_cuda_vitals_gpu_only_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test__local_scalar_dense_with_empty_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcdiv_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_cuda_errors_with_cpu_scalars_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_False_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_addcmul_use_cpu_scalar_True_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_assertRaisesRegex_ignore_msg_non_native_device_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_edge_cases_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_p_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bernoulli_self_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bfloat16_neg_abs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bool_tensor_value_change_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_add_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_addcdiv_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_addcmul_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_atan2_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_copy_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_dist_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_div_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_eq_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_fmod_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_ge_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_gt_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_le_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_lerp_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_lt_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_map2_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_map_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_fill_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_scatter_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_masked_select_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_max_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_min_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_mul_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_ne_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_pow_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_remainder_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_broadcast_fn_sub_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_bytes_to_scalar_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_no_inf_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cauchy_no_inf_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_cuda_backward_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_euclidean_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_grad_p_lt_1_no_nan_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_large_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_non_contiguous_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_non_contiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_norm_batch_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_norm_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cdist_same_inputs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_check_tensor_all_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_check_tensor_internal_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_all_dtypes_and_devices_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_not_memory_dense_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_clone_zero_stride_dim_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_complex_half_experimental_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_constants_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_conv_transposed_backward_agnostic_to_memory_format_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_conv_transposed_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy__cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_all_dtypes_and_devices_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_math_view_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_mem_overlap_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_copy_transpose_math_view_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_corrcoef_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cov_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cpp_warnings_have_python_context_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummax_cummin_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummax_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cummin_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumprod_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumsum_64bit_indexing_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_cumsum_outer_dim_64bit_indexing_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deepcopy_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_empty_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_replication_pad2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_deterministic_resize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_device_guard_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_diff_noncontig_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dim_function_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_discontiguous_out_cumsum_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dist_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_dtypetensor_warnings_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_expected_failure_xla_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_no_zero_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_exponential_no_zero_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gather_backward_deterministic_path_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gather_backward_one_dim_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_geometric_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scale_will_not_overflow_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaler_deprecated_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaler_pass_itself_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_accumulation_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach0_fused0_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach2_fused_True_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_autocast_foreach_True_fused1_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_clipping_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_clipping_separate_unscale_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_multiple_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_penalty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_state_dict_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_unscale_sparse_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_grad_scaling_update_scale_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_extreme_cases_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_spacing_list_length_error_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_type_promotion_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_hook_remove_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_add_large_inputs_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_add_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_copy_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_fill_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_index_put_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_int64_upsample3d_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_invalid_shapes_grid_sampler_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_is_set_to_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_is_signed_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e4m3fn, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e4m3fnuz, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e5m2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_float8_e5m2fnuz, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_item_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_large_cumprod_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_large_cumsum_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_binary_op_no_materialize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lazy_clone_view_materialize_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_log_normal_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_logcumsumexp_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_lognormal_kstest_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_bool_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bfloat16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bfloat16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bool_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_bool_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex128_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex128_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_complex64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float32_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float32_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_float64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int16_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int16_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int32_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int32_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int64_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int64_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int8_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_int8_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_uint8_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_cuda_uint8_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_fill_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_bool_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_inplace_noncontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_large_tensor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_scatter_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_masked_select_discontiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_clone_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_consistency_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_cpu_and_cuda_ops_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_empty_like_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_factory_like_functions_preserve_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_operators_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_preserved_after_permute_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_propagation_rules_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_to_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_type_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_memory_format_type_shortcuts_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_module_share_memory_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cpu_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_deterministic_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_device_constrain_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_empty_w_replacement_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_empty_wo_replacement_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_gpu_device_constrain_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_multinomial_rng_state_advance_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_narrow_copy_non_contiguous_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_narrow_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_no_nondeterministic_alert_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_no_nondeterministic_alert_interpolate_trilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveAvgPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveAvgPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AdaptiveMaxPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_AvgPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_CTCLoss_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_EmbeddingBag_max_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_FractionalMaxPool2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_FractionalMaxPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxPool3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool1d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool2d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_MaxUnpool3d_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_NLLLoss_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReflectionPad1d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReflectionPad3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad1d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_ReplicationPad3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_bincount_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_grid_sample_2d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_grid_sample_3d_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_histc_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_bicubic_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_bilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_linear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_interpolate_trilinear_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_median_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_put_accumulate_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_alert_put_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_qint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_qint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint2x4, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint4x2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nondeterministic_resize_quantized_cuda_quint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_normal_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_nullary_op_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pairwise_distance_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_parallel_cow_materialize_error_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_and_graph_partition_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_AdamW_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_Adam_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_params_invalidated_with_grads_invalidated_between_unscale_and_step_SGD_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pdist_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pdist_norm_large_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pickle_gradscaler_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_pin_memory_from_constructor_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_accumulate_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_put_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_reduced_type_float_copy_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_reduced_type_float_copy_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_repeat_interleave_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scalar_check_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_bool_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_non_unique_index_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_one_dim_deterministic_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_add_to_large_input_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_bool_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_multiply_unsupported_dtypes_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_multiply_unsupported_dtypes_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_non_unique_index_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_operations_to_large_input_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_reduce_scalar_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_to_large_input_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_scatter_zero_size_index_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_serialization_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_default_tensor_type_warnings_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_set_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_shift_mem_overlap_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_skip_xla_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_all_devices_non_blocking_False_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_all_devices_non_blocking_True_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_errors_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_from_tensor_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_meta_ok_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_qint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_qint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_quint4x2, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_quint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_setitem_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_storage_use_count_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_strides_propagation_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_sync_warning_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_take_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_from_storage_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_set_errors_multigpu_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_shape_empty_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_storage_type_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_tensor_type_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_ternary_op_mem_overlap_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_bool, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_complex128, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_complex64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_int8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_typed_storage_meta_cuda_uint8, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_bfloat16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float16, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float32, test/test_torch.py::TestTorchDeviceTypeCUDA::test_uniform_kstest_cuda_float64, test/test_torch.py::TestTorchDeviceTypeCUDA::test_untyped_storage_meta_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_warn_always_caught_cuda, test/test_torch.py::TestTorchDeviceTypeCUDA::test_where_scalar_handcrafted_values_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_advancedindex_mixed_cpu_devices_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_advancedindex_mixed_devices_error_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_float32, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_float64, test/test_torch.py::TestDevicePrecisionCUDA::test_clamp_cuda_int64, test/test_torch.py::TestDevicePrecisionCUDA::test_copy_broadcast_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_copy_noncontig_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_cuda_device_idx_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_device_serialization_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float16, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float32, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_float64, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int16, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int32, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int64, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_int8, test/test_torch.py::TestDevicePrecisionCUDA::test_from_sequence_cuda_uint8, test/test_torch.py::TestDevicePrecisionCUDA::test_index_add_bfloat16_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_multidevice_serialization_cuda, test/test_torch.py::TestDevicePrecisionCUDA::test_type_conversions_same_device_cuda 2025-12-04T15:21:47.2765405Z 2025-12-04T15:21:47.2765777Z Finished test_torch 1/1 ... [2025-12-04 15:21:47.213468][22179.222691672], took 1.40min 2025-12-04T15:21:47.2766678Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_torch/test_torch-6322eeaa434bd119.xml 2025-12-04T15:21:47.3459487Z Running xpu/test_gemm 1/1 ... [2025-12-04 15:21:47.345546][22179.354766493] 2025-12-04T15:21:47.3459925Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:21:47.3462635Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_gemm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:47.345915] 2025-12-04T15:21:51.4678816Z 2025-12-04T15:21:51.4679728Z xpu/test_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_gemm_1.1_db81f0dcd896f79f_.log 2025-12-04T15:21:51.4680422Z Running 0 items in this shard: 2025-12-04T15:21:51.4680620Z 2025-12-04T15:21:51.4680884Z Finished xpu/test_gemm 1/1 ... [2025-12-04 15:21:51.467487][22183.476711952], took 0.07min 2025-12-04T15:21:51.4917924Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-6cf9ed264c8fa189.xml 2025-12-04T15:21:51.5354491Z Running test_binary_ufuncs 1/1 ... [2025-12-04 15:21:51.534999][22183.544224671] 2025-12-04T15:21:51.5355060Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:21:51.5357798Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_binary_ufuncs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:51.535354] 2025-12-04T15:26:18.5305565Z 2025-12-04T15:26:18.5309264Z test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_binary_ufuncs_1.1_d43f59e69a692663_.log 2025-12-04T15:26:19.0818977Z Running 12917 items in this shard: test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_broadcast_empty_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_with_tail_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addcmul_scalars_as_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addsub_half_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_edgecases_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_scalar_device_unspecified_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_ops_with_scalars_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bool_tensor_comparison_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cmul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpu_tensor_pow_cuda_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cremainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_binary_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_inplace_error_msg_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_csub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cuda_tensor_pow_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cumulative_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_script_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divmul_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_exceptions_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_idiv_and_ifloordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_division_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_dunders_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_and_float_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_tensor_pow_neg_ints_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_with_nontrivial_alignment_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_long_tensor_pow_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_forward_ad_float32_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_chalf_tensor_and_cpu_scalar_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_bfloat16_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_out_resize_warning_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_inplace_resizing_exception_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_base_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_overloads_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_overflow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rpow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_typing_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_tensor_pow_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___radd___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rand___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rdiv___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmod___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmul___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___ror___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rpow___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rsub___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rxor___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_return_by_ref_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_max_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_min_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_h_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_he_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_laguerre_polynomial_l_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_legendre_polynomial_p_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_bfloat16_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_gradients_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_uint8 2025-12-04T15:26:19.6614737Z 2025-12-04T15:26:19.6615044Z Finished test_binary_ufuncs 1/1 ... [2025-12-04 15:26:18.553447][22450.562663354], took 4.45min 2025-12-04T15:26:19.6616057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-510898c7a9dfb9c9.xml 2025-12-04T15:26:19.6616978Z Running test_modules 2/4 ... [2025-12-04 15:26:18.966945][22450.976162666] 2025-12-04T15:26:19.6617398Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:26:19.6618394Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_modules.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:26:18.967409] 2025-12-04T15:34:16.1094400Z 2025-12-04T15:34:16.1095638Z test_modules 2/4 was successful, full logs can be found in artifacts with path test/test-reports/test_modules_2.4_d8a3e6157b79afbb_.log 2025-12-04T15:34:16.1445163Z Running 909 items in this shard: test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardswish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_check_inplace_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BCELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose1d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LayerNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_device_ctx_init_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm2d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_errors_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_AvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose2d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_forward_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Embedding_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRUCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_GaussianNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_float16, test/test_modules.py::TestModuleCUDA::test_forward_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_InstanceNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiLabelSoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_PReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReplicationPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_forward_nn_SmoothL1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_forward_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_grad_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_BatchNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CircularPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LSTM_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LayerNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerEncoder_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_Transformer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_grad_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_FractionalMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_RNN_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_SELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveAvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_AvgPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_BatchNorm2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_GRU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_InstanceNorm1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LPPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LPPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_MultiheadAttention_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_RNN_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerEncoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_TransformerEncoder_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_if_train_and_eval_modes_differ_nn_ZeroPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveAvgPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_AvgPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_BatchNorm3d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Bilinear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CircularPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConstantPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose2d_cuda_complex32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardswish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LPPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_LogSigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MarginRankingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_NLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_PReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNNCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_RNN_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_SoftMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmin_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoderLayer_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_memory_format_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_BatchNorm3d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CosineEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Embedding_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GroupNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm1d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_InstanceNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LPPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LSTM_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LazyConvTranspose2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_LogSoftmax_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MSELoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Mish_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReflectionPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ReplicationPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softplus_cuda_float64, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Softshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerEncoder_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_AdaptiveMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_CircularPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Conv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_complex128, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GRU_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_GaussianNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Hardshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HingeEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_L1Loss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_L1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LPPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConv3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LazyConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiLabelMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_PoissonNLLLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_ReplicationPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_SoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softsign_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoderLayer_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoderLayer_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_AdaptiveMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_BatchNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_CELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CircularPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConstantPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Conv3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_CosineEmbeddingLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ELU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRUCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_GRU_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardtanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Hardtanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_HuberLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm1d_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_InstanceNorm3d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_KLDivLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_KLDivLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LSTMCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_LazyConvTranspose2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_LeakyReLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MSELoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_MarginRankingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_RNN_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReLU6_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReflectionPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ReplicationPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_SELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_SiLU_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Softmin_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanh_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_Threshold_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerDecoderLayer_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_repr_nn_ZeroPad3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveAvgPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AdaptiveMaxPool2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_AvgPool2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BCEWithLogitsLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BCEWithLogitsLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm1d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm2d_eval_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_BatchNorm3d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Bilinear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CTCLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CTCLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConstantPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose2d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_complex64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ConvTranspose3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float16, test/test_modules.py::TestModuleCUDA::test_save_load_nn_CrossEntropyLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_FractionalMaxPool3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_FractionalMaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GELU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GRU_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_GroupNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_HingeEmbeddingLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_HuberLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm2d_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_InstanceNorm2d_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LPPool1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LSTMCell_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LSTM_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv1d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv1d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConv2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LazyConvTranspose3d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Linear_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Linear_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LocalResponseNorm_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_LogSoftmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MaxPool3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Mish_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiLabelMarginLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_MultiLabelSoftMarginLoss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_PoissonNLLLoss_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RMSNorm_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNNCell_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_RNN_eval_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReflectionPad2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ReplicationPad3d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SiLU_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Sigmoid_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Sigmoid_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_SmoothL1Loss_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax2d_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softmax_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softplus_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Softsign_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanh_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanhshrink_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Tanhshrink_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_Threshold_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoderLayer_train_mode_cuda_float64, test/test_modules.py::TestModuleCUDA::test_save_load_nn_TransformerEncoder_train_mode_cuda_float32, test/test_modules.py::TestModuleCUDA::test_save_load_nn_ZeroPad2d_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AdaptiveAvgPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_AvgPool3d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCELoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BCELoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm1d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_BatchNorm2d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CTCLoss_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CircularPad3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConstantPad1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Conv3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ConvTranspose1d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_CrossEntropyLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ELU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRUCell_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GRU_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_GroupNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Hardswish_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_HingeEmbeddingLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm1d_train_mode_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_InstanceNorm2d_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_L1Loss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LayerNorm_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LayerNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Linear_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LocalResponseNorm_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_LogSigmoid_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MaxPool3d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_MultiheadAttention_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_PoissonNLLLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RNNCell_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_RNN_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReLU_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReLU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReflectionPad1d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ReplicationPad2d_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SELU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SiLU_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_SoftMarginLoss_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softmin_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softplus_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Softshrink_swap_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Tanh_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_Threshold_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerDecoderLayer_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_TransformerEncoderLayer_train_mode_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_empty_nn_ZeroPad2d_swap_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveAvgPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AdaptiveMaxPool1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_AvgPool3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCELoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BCEWithLogitsLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm2d_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_BatchNorm3d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Bilinear_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Bilinear_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CELU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CircularPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConstantPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Conv3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ConvTranspose2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CosineEmbeddingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_CrossEntropyLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_FractionalMaxPool3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GELU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GRU_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_GroupNorm_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardswish_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Hardtanh_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_HuberLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_eval_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm1d_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_eval_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_InstanceNorm3d_train_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_KLDivLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LPPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTMCell_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LSTM_train_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LeakyReLU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LeakyReLU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Linear_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSigmoid_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_LogSoftmax_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MSELoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MarginRankingLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool1d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool2d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MaxPool2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiMarginLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_MultiMarginLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_NLLLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_NLLLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PReLU_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PReLU_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_PoissonNLLLoss_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RMSNorm_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNNCell_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_eval_mode_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_RNN_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReLU6_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad2d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad3d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReflectionPad3d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad1d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad2d_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad3d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ReplicationPad3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SELU_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SiLU_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SmoothL1Loss_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SmoothL1Loss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_SoftMarginLoss_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmax_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmin_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softmin_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softplus_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softshrink_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Softsign_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanh_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanhshrink_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Tanhshrink_swap_True_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Threshold_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Threshold_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_eval_mode_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_TransformerEncoder_train_mode_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_Transformer_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad1d_swap_False_set_grad_True_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad1d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad2d_swap_False_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad3d_swap_True_set_grad_False_cuda_float32, test/test_modules.py::TestModuleCUDA::test_to_nn_ZeroPad3d_swap_True_set_grad_True_cuda_float32 2025-12-04T15:34:16.1782824Z 2025-12-04T15:34:16.1783105Z Finished test_modules 2/4 ... [2025-12-04 15:34:16.110571][22928.119794958], took 7.95min 2025-12-04T15:34:16.1784154Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_modules/test_modules-1ceed37f0450876d.xml 2025-12-04T15:34:16.2354403Z Running torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 15:34:16.235040][22928.244262048] 2025-12-04T15:34:16.2355095Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:16.2358071Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/linalg/test_linalg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:16.235398] 2025-12-04T15:34:27.8704153Z 2025-12-04T15:34:27.8705516Z torch_np/numpy_tests/linalg/test_linalg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_3f3446ecd43fd597_.log 2025-12-04T15:34:27.8806186Z Running 268 items in this shard: test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size_k, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_identity, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_basic_nonsvd, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_nan, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_stacked_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_zero, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_0_n_rhs_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_2_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_future_rcond, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_incompatible_dims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNorm_NonSystematic::test_intmin, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_matrix_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_reduced_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_symmetric_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_all_but_economic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_raw, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_3_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_byteorder_check, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_generalized_raise_multiloop, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_sdot_bug_8577, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_xerbla_override, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_dynamic_programming_optimization, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_three_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_two_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_logic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_optimization_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_three_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_too_few_input_arrays, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_two_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_and_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_-2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_result, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a0_axes0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a1_axes1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_dot, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_geqrf_lwork_smoketest, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_unsupported_commontype 2025-12-04T15:34:27.8905058Z 2025-12-04T15:34:27.8905486Z Finished torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 15:34:27.870514][22939.879736442], took 0.19min 2025-12-04T15:34:27.8958586Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-320a7bc7a2da135c.xml 2025-12-04T15:34:27.9893105Z Running torch_np/numpy_tests/core/test_dtype 1/1 ... [2025-12-04 15:34:27.988940][22939.998162484] 2025-12-04T15:34:27.9893933Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:27.9896320Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_dtype.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:27.989251] 2025-12-04T15:34:32.0117001Z 2025-12-04T15:34:32.0118044Z torch_np/numpy_tests/core/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_dtype_1.1_bb9947961cd52757_.log 2025-12-04T15:34:32.0164835Z Running 102 items in this shard: test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_equivalent_dtype_hashing, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_invalid_types, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Bool, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Bytes0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex128, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Datetime64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float128, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int8, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Object0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Str0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Timedelta64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt8, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Uint32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Uint64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Void0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation1, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation2, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation3, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_equality, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t1, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t2, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t3, test/torch_np/numpy_tests/core/test_dtype.py::TestDtypeAttributeDeletion::test_dtype_non_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_dtype.py::TestDtypeAttributeDeletion::test_dtype_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t0, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t1, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t2, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t3, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t4, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_DType11, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_bool__10, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_complex128_4, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_complex64_3, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float16_0, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float32_1, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float64_2, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int16_7, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int32_8, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int64_9, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int8_6, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_uint8_5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_complex64_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_float16_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_float32_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_other_4294967295_expected1_expected_weak1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_other_65535_expected0_expected_weak0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other0_expected0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other1_expected1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other2_expected2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other3_expected3, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other4_expected4, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other5_expected5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other6_expected6, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes0_expected0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes1_expected1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes2_expected2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes3_expected3, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes4_expected4, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes5_expected5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes6_expected6, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes7_expected7, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes8_expected8, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes9_expected9, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_18446744073709551616, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_200, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_4294967296, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_9223372036854775808, test/torch_np/numpy_tests/core/test_dtype.py::TestMisc::test_dtypes_are_true, test/torch_np/numpy_tests/core/test_dtype.py::TestMisc::test_keyword_argument, test/torch_np/numpy_tests/core/test_dtype.py::TestFromDTypeAttribute::test_recursion, test/torch_np/numpy_tests/core/test_dtype.py::TestFromDTypeAttribute::test_simple, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_?, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_B, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_D, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_F, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_b, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_d, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_e, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_f, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_h, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_i, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_l, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_scalar, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_0, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_1, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_2, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_3 2025-12-04T15:34:32.0209425Z 2025-12-04T15:34:32.0209903Z Finished torch_np/numpy_tests/core/test_dtype 1/1 ... [2025-12-04 15:34:32.011476][22944.020698739], took 0.07min 2025-12-04T15:34:32.0368067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-9c6a851d43187f63.xml 2025-12-04T15:34:32.0748314Z Running lazy/test_debug_util 1/1 ... [2025-12-04 15:34:32.074465][22944.083689789] 2025-12-04T15:34:32.0748951Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:32.0752074Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_debug_util.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:32.074834] 2025-12-04T15:34:35.8464023Z 2025-12-04T15:34:35.8464997Z lazy/test_debug_util 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_debug_util_1.1_6159721dd42cd649_.log 2025-12-04T15:34:35.8466112Z Running 1 items in this shard: test/lazy/test_debug_util.py::DebugUtilTest::test_get_python_frames 2025-12-04T15:34:35.8466585Z 2025-12-04T15:34:35.8467072Z Finished lazy/test_debug_util 1/1 ... [2025-12-04 15:34:35.846085][22947.855309677], took 0.06min 2025-12-04T15:34:35.8712749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-612fe6974f2e86fb.xml 2025-12-04T15:34:35.9233223Z Running nn/test_load_state_dict 1/1 ... [2025-12-04 15:34:35.922971][22947.932196328] 2025-12-04T15:34:35.9234066Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:35.9236376Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_load_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:35.923285] 2025-12-04T15:34:40.2456512Z 2025-12-04T15:34:40.2458035Z nn/test_load_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_load_state_dict_1.1_1f7336ad32e96ae1_.log 2025-12-04T15:34:40.2470892Z Running 29 items in this shard: test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_BC_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_BC_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_False_keep_vars_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_False_keep_vars_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_True_keep_vars_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_True_keep_vars_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_shape_stride_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_shape_stride_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_with_optimizer_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_with_optimizer_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_child_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_child_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_custom_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_custom_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_invalid_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_invalid_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_ref_cycle_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_type_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_type_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_warn_assign_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_warn_assign_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_with_unexpected_key_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_with_unexpected_key_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_scalar_param_1d_tensor_raises_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_scalar_param_1d_tensor_raises_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDictSwap::test_swap_subclass_swap_True_assign_False, test/nn/test_load_state_dict.py::TestLoadStateDictSwap::test_swap_subclass_swap_True_assign_True 2025-12-04T15:34:40.2482939Z 2025-12-04T15:34:40.2483376Z Finished nn/test_load_state_dict 1/1 ... [2025-12-04 15:34:40.245313][22952.254538145], took 0.07min 2025-12-04T15:34:40.2706517Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-573eaa6de6818c33.xml 2025-12-04T15:34:40.3058571Z Running test_shape_ops 1/1 ... [2025-12-04 15:34:40.305419][22952.314643888] 2025-12-04T15:34:40.3059323Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:40.3060775Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_shape_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:40.305721] 2025-12-04T15:34:45.5314239Z 2025-12-04T15:34:45.5315260Z test_shape_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_shape_ops_1.1_17556160abffc005_.log 2025-12-04T15:34:45.5344528Z Running 99 items in this shard: test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_propagates_nans_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_raises_arg_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_complex_rot90_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_complex_rot90_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_diag_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_diag_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_diagonal_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_diagonal_multidim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_large_tensor_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_unsupported_dtype_cuda_quint2x4, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_unsupported_dtype_cuda_quint4x2, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_astuple_out_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_discontiguous_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_no_warning_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_non_diff_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_rot90_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_tolist_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_unbind_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_all_devices_and_dtypes_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_backward_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_scalars_cuda 2025-12-04T15:34:45.5372859Z 2025-12-04T15:34:45.5373109Z Finished test_shape_ops 1/1 ... [2025-12-04 15:34:45.531050][22957.540274024], took 0.09min 2025-12-04T15:34:45.5567659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_shape_ops/test_shape_ops-8ae5e584fb53bb5e.xml 2025-12-04T15:34:45.6071124Z Running profiler/test_memory_profiler 1/1 ... [2025-12-04 15:34:45.606777][22957.616001326] 2025-12-04T15:34:45.6071878Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:45.6075659Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_memory_profiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:45.607097] 2025-12-04T15:34:53.3864828Z 2025-12-04T15:34:53.3865910Z profiler/test_memory_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_memory_profiler_1.1_f20e3ab107ff598c_.log 2025-12-04T15:34:53.3880604Z Running 33 items in this shard: test/profiler/test_memory_profiler.py::TestMemoryProfiler::test_config_check, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module_and_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer_set_to_none, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_low_level, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_complicated, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_non_op_allocations, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_inplace, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_stacked, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_with_annotations, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_tensorlist, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_lazy, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_lazily_initialized, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_manual_optimizer_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_memory_timeline, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients_set_to_none, test/profiler/test_memory_profiler.py::TestMemoryProfilerTimelineCUDA::test_memory_timeline_no_id_cuda 2025-12-04T15:34:53.3894827Z 2025-12-04T15:34:53.3895283Z Finished profiler/test_memory_profiler 1/1 ... [2025-12-04 15:34:53.386130][22965.395353946], took 0.13min 2025-12-04T15:34:53.4120710Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-419c9aea1e4e06f2.xml 2025-12-04T15:34:53.4851715Z Running test_indexing 1/1 ... [2025-12-04 15:34:53.484791][22965.494012062] 2025-12-04T15:34:53.4852429Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:53.4856762Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_indexing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:53.485183] 2025-12-04T15:35:16.7497334Z 2025-12-04T15:35:16.7498483Z test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_indexing_1.1_fbbd66d5cf2cd3ea_.log 2025-12-04T15:35:16.7558464Z Running 186 items in this shard: test/test_indexing.py::TestIndexingCUDA::test_advancedindex_big_cuda, test/test_indexing.py::TestIndexingCUDA::test_advancedindex_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_advancedindex_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_basic_advanced_combined_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_indices_accumulate_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_mask_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask2d_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask_accumulate_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_tensor_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_cpu_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_cuda_broadcast_index_use_deterministic_algorithms_cuda, test/test_indexing.py::TestIndexingCUDA::test_ellipsis_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_ndim_index_bool_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_ndim_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_slice_cuda, test/test_indexing.py::TestIndexingCUDA::test_errors_index_copy_cuda, test/test_indexing.py::TestIndexingCUDA::test_gather_take_along_dim_cross_device_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_getitem_scalars_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_add_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_getitem_copy_bools_slices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_ind_dtype_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_limits_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_duplicate_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_empty_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_expanded_values_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_large_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_non_contiguous_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_deterministic_with_optional_tensors_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_large_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_non_accumulate_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float8_e4m3fn, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float8_e5m2, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_scalar_with_bool_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e4m3fn, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e4m3fnuz, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e5m2, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e5m2fnuz, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_setitem_bools_slices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_int_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices2d_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices_broadcast_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_invalid_device_cuda, test/test_indexing.py::TestIndexingCUDA::test_invalid_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_jit_indexing_cuda, test/test_indexing.py::TestIndexingCUDA::test_list_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_multi_dimensional_bool_mask_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_multi_dimensional_bool_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_bool_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_byte_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_int_cuda, test/test_indexing.py::TestIndexingCUDA::test_none_cuda, test/test_indexing.py::TestIndexingCUDA::test_out_of_bound_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_set_item_to_scalar_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_setitem_expansion_error_cuda, test/test_indexing.py::TestIndexingCUDA::test_setitem_scalars_cuda, test/test_indexing.py::TestIndexingCUDA::test_single_int_cuda, test/test_indexing.py::TestIndexingCUDA::test_step_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_step_cuda, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_invalid_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_invalid_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_unravel_index_errors_cuda, test/test_indexing.py::TestIndexingCUDA::test_variable_slicing_cuda, test/test_indexing.py::TestIndexingCUDA::test_zero_dim_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_assignment_value_mismatch_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_alldims_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_onedim_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_twodim_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_weirdness_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_weirdness_tensors_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_list_indexing_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_shape_mismatch_cuda, test/test_indexing.py::NumpyTestsCUDA::test_broadcast_subspace_cuda, test/test_indexing.py::NumpyTestsCUDA::test_broaderrors_indexing_cuda, test/test_indexing.py::NumpyTestsCUDA::test_ellipsis_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_empty_fancy_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_empty_tuple_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_everything_returns_views_cuda, test/test_indexing.py::NumpyTestsCUDA::test_index_is_larger_cuda, test/test_indexing.py::NumpyTestsCUDA::test_index_no_floats_cuda, test/test_indexing.py::NumpyTestsCUDA::test_none_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_single_bool_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_single_int_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_trivial_fancy_out_of_bounds_cuda, test/test_indexing.py::NumpyTestsCUDA::test_truncate_leading_1s_cuda 2025-12-04T15:35:16.7617366Z 2025-12-04T15:35:16.7617754Z Finished test_indexing 1/1 ... [2025-12-04 15:35:16.749698][22988.758921109], took 0.39min 2025-12-04T15:35:16.7756747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_indexing/test_indexing-bb3db4f55bab2e87.xml 2025-12-04T15:35:16.8707512Z Running test_type_info 1/1 ... [2025-12-04 15:35:16.870278][22988.879502193] 2025-12-04T15:35:16.8708197Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:35:16.8709979Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_info.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:16.870630] 2025-12-04T15:35:20.5418978Z 2025-12-04T15:35:20.5419988Z test_type_info 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_info_1.1_02020d4e7679db8b_.log 2025-12-04T15:35:20.5422469Z Running 5 items in this shard: test/test_type_info.py::TestDTypeInfo::test_finfo, test/test_type_info.py::TestDTypeInfo::test_iinfo, test/test_type_info.py::TestDTypeInfo::test_invalid_input, test/test_type_info.py::TestDTypeInfo::test_to_complex, test/test_type_info.py::TestDTypeInfo::test_to_real 2025-12-04T15:35:20.5423794Z 2025-12-04T15:35:20.5424081Z Finished test_type_info 1/1 ... [2025-12-04 15:35:20.541522][22992.550745742], took 0.06min 2025-12-04T15:35:20.5677951Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_info/test_type_info-3cbecfd6afe8711f.xml 2025-12-04T15:35:20.6050101Z Running functorch/test_aotdispatch 1/1 ... [2025-12-04 15:35:20.604648][22992.613872504] 2025-12-04T15:35:20.6050736Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:35:20.6054029Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_aotdispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:20.605005] 2025-12-04T15:37:22.8417134Z 2025-12-04T15:37:22.8420883Z functorch/test_aotdispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aotdispatch_1.1_73fa05bc552fde2d_.log 2025-12-04T15:37:22.8667981Z Running 537 items in this shard: test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_module, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_ban_dropout_mut_pre_dispatch, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_multiple_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_no_buffer_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_functionalized_rng_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_dupes_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_input_requiring_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_parameter_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_metadata_mutation_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_module_joint, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_multiple_outputs_require_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_buffer_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_inplace, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_linear, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_contiguous, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_conv_and_bn, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_composite_implicit, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_simple, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_view, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_1, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_2, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_outdtype, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_reshape, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_autograd_op, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond_nested, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_basic, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_pytrees_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_synthetic_bases_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_unbacked_arg, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_with_torch_cond, test/functorch/test_aotdispatch.py::TestPartitioning::test_autocast, test/functorch/test_aotdispatch.py::TestPartitioning::test_contiguous, test/functorch/test_aotdispatch.py::TestPartitioning::test_custom_partitioner_fn, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_getitem, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_generate_gives_inference_graph, test/functorch/test_aotdispatch.py::TestPartitioning::test_meta_tensor_inplace_op, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_raise_getitems, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_save_shape, test/functorch/test_aotdispatch.py::TestPartitioning::test_preserve_random, test/functorch/test_aotdispatch.py::TestPartitioning::test_quantize_activation_duplicate_nodes, test/functorch/test_aotdispatch.py::TestPartitioning::test_recompute_partitioning, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_incorrect_backward, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_inference, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad_views, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_simple, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_dynamic, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_fake_tensor_gm_raises, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace_from_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_test_subclasses_with_tensor_factories, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_flex_attn_noncontiguous_tangents, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_dense, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_tensor_tangent, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inductor_freezing_with_subclasses, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inference_python_dispatcher, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_layer_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_lift_fresh_copy_in_graph, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rms_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_all, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_donated, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_no_static, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_donated_buffers, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_params, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_recompile, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters_torture_case, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_tangent_type_coercion, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_wrong_guess_tangent_type, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_detach 2025-12-04T15:37:22.8911102Z 2025-12-04T15:37:22.8911490Z Finished functorch/test_aotdispatch 1/1 ... [2025-12-04 15:37:22.842646][23114.85186094], took 2.04min 2025-12-04T15:37:22.8912790Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-3265775c77799c99.xml 2025-12-04T15:37:22.9596882Z Running test_scatter_gather_ops 1/1 ... [2025-12-04 15:37:22.959339][23114.968561261] 2025-12-04T15:37:22.9597595Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:37:22.9600415Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_scatter_gather_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:37:22.959670] 2025-12-04T15:37:42.5608776Z 2025-12-04T15:37:42.5609667Z test_scatter_gather_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_scatter_gather_ops_1.1_e624bed173f96ebf_.log 2025-12-04T15:37:42.5641742Z Running 76 items in this shard: test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_bool_cuda_bool, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_broadcasted_index_deterministic_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_mult_index_base_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex128, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_uint8 2025-12-04T15:37:42.5672660Z 2025-12-04T15:37:42.5672985Z Finished test_scatter_gather_ops 1/1 ... [2025-12-04 15:37:42.560625][23134.569849357], took 0.33min 2025-12-04T15:37:42.5874007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5e8dbe55d5e60a97.xml 2025-12-04T15:37:42.6604747Z Running test_cuda_multigpu 1/1 ... [2025-12-04 15:37:42.660083][23134.669306695] 2025-12-04T15:37:42.6605415Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:37:42.6609044Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_multigpu.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:37:42.660403] 2025-12-04T15:37:46.9826127Z 2025-12-04T15:37:46.9826993Z test_cuda_multigpu 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_multigpu_1.1_134114cd1fad822a_.log 2025-12-04T15:37:46.9845596Z Running 61 items in this shard: test/test_cuda_multigpu.py::TestCudaMultiGPU::test_autogpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_caching_pinned_memory_multi_gpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cat_autogpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_copy_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_copy_streams, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_device_memory_allocated, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_init_race, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_memory_leak_detection, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_set_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_synchronize, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_current_stream, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_default_stream, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_multi_gpu_elapsed_time, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_multi_gpu_query, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_wait, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_external_streams, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_external_streams_multi_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_get_set_rng_state_all, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_device_as_key, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_multigpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_scale, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_load_nonexistent_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_mem_get_info, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_memory_stats, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_memory_stats_multigpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_serialization_remap, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_serialization_remap_dict, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_storage_clone, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_new, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_rng_state_offset, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_context, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_event_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_event_nogil, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streaming_backwards_device_transfer, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu_eq, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu_query, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_priority, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_tensor_device, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced_dense_only, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced_empty_tensors, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_cpu, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_gpu, test/test_cuda_multigpu.py::TestCudaComm::test_gather, test/test_cuda_multigpu.py::TestCudaComm::test_gather_dim, test/test_cuda_multigpu.py::TestCudaComm::test_gather_namedtuple, test/test_cuda_multigpu.py::TestCudaComm::test_gather_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_memory_format_scatter_gather, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add_coalesced, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add_coalesced_dense_only, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_sizes, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_sizes, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_namedtuple 2025-12-04T15:37:46.9873949Z 2025-12-04T15:37:46.9874221Z Finished test_cuda_multigpu 1/1 ... [2025-12-04 15:37:46.982300][23138.991524685], took 0.07min 2025-12-04T15:37:47.0091817Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-339f2b8a0ba2c562.xml 2025-12-04T15:37:47.0485589Z Running torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 15:37:47.048206][23139.057429969] 2025-12-04T15:37:47.0486254Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:37:47.0489571Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_index_tricks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:37:47.048525] 2025-12-04T15:37:51.0709768Z 2025-12-04T15:37:51.0710858Z torch_np/numpy_tests/lib/test_index_tricks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_a7d224f05328be14_.log 2025-12-04T15:37:51.0729613Z Running 47 items in this shard: test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_big_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_clipmodes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_dtypes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_clip, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_raise, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_unravel, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_writeability, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_longdouble, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npcomplexfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_linspace_equivalence, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start0_stop_10_step0_expected0, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start_-10_stop_20_step1_expected1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_nd, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_sparse, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_1d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_2d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_complex_step, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_more_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdenumerate::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_simple_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_1d_only, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_bool, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_repeated_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_shape_and_dtype, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestC::test_c_, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_hetero_shape_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_low_dim_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_operate_4d_array, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_wide_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndices::test_diag_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_diag_indices_from, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_shape_mismatch, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_small_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdIndex::test_ndindex 2025-12-04T15:37:51.0747699Z 2025-12-04T15:37:51.0748089Z Finished torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 15:37:51.070659][23143.079884086], took 0.07min 2025-12-04T15:37:51.0977333Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-7a9eb44e36e96ef2.xml 2025-12-04T15:37:51.1319042Z Running test_jit_autocast 1/1 ... [2025-12-04 15:37:51.131567][23143.140792043] 2025-12-04T15:37:51.1319692Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:37:51.1322625Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:37:51.131869] 2025-12-04T15:38:17.7185824Z 2025-12-04T15:38:17.7186574Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_449f99b0d0d7aa89_.log 2025-12-04T15:38:17.7202638Z Running 54 items in this shard: test/test_jit_autocast.py::TestAutocast::test_autocast_api, test/test_jit_autocast.py::TestAutocast::test_autocast_api_not_supported, test/test_jit_autocast.py::TestAutocast::test_autocast_autodiff, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator_outside_jit, test/test_jit_autocast.py::TestAutocast::test_autocast_mixed_dtypes, test/test_jit_autocast.py::TestAutocast::test_callees, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_off, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_on, test/test_jit_autocast.py::TestAutocast::test_conditional_autocast, test/test_jit_autocast.py::TestAutocast::test_control_flow, test/test_jit_autocast.py::TestAutocast::test_divergent_autocast, test/test_jit_autocast.py::TestAutocast::test_divergent_types, test/test_jit_autocast.py::TestAutocast::test_duplicate_inputs, test/test_jit_autocast.py::TestAutocast::test_eager_and_script, test/test_jit_autocast.py::TestAutocast::test_explicit_casts, test/test_jit_autocast.py::TestAutocast::test_fp32_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_policy_with_fp64, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_ignore_amp, test/test_jit_autocast.py::TestAutocast::test_implicitly_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_inplace, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_cpu, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_gpu, test/test_jit_autocast.py::TestAutocast::test_jit_call_method_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_executor_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_basic, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_constants, test/test_jit_autocast.py::TestAutocast::test_jit_generic_autocast, test/test_jit_autocast.py::TestAutocast::test_linear_bf16, test/test_jit_autocast.py::TestAutocast::test_minimal, test/test_jit_autocast.py::TestAutocast::test_minimal_cpu, test/test_jit_autocast.py::TestAutocast::test_minimal_off, test/test_jit_autocast.py::TestAutocast::test_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_promote_policy, test/test_jit_autocast.py::TestAutocast::test_promote_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_reused_autocast, test/test_jit_autocast.py::TestAutocast::test_reused_autocast_expr, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state_expr, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing_with_autocast, test/test_jit_autocast.py::TestAutocast::test_script_module, test/test_jit_autocast.py::TestAutocast::test_tracing_and_script, test/test_jit_autocast.py::TestAutocast::test_tracing_with_autocast_and_script, test/test_jit_autocast.py::TestJitTraceAutocast::test_cat_promote, test/test_jit_autocast.py::TestJitTraceAutocast::test_generate_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nchw_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nhwc_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cpu, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cuda, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_enable_and_check, test/test_jit_autocast.py::TestJitTraceAutocast::test_scripted_aliasing 2025-12-04T15:38:17.7218475Z 2025-12-04T15:38:17.7218742Z Finished test_jit_autocast 1/1 ... [2025-12-04 15:38:17.718298][23169.727522524], took 0.44min 2025-12-04T15:38:17.7458545Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-8a1338a601c4ef0b.xml 2025-12-04T15:38:17.8683418Z Running test_xnnpack_integration 1/1 ... [2025-12-04 15:38:17.867987][23169.877209737] 2025-12-04T15:38:17.8683908Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:38:17.8686858Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:17.868302] 2025-12-04T15:38:29.6903009Z 2025-12-04T15:38:29.6904032Z test_xnnpack_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_1.1_ef1a45d9c52ae3ce_.log 2025-12-04T15:38:29.6908794Z Running 12 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear_1d_input, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_combined_model, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_decomposed_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_linear, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_basic, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_with_relu_fc 2025-12-04T15:38:29.6913188Z 2025-12-04T15:38:29.6913499Z Finished test_xnnpack_integration 1/1 ... [2025-12-04 15:38:29.689941][23181.699165299], took 0.20min 2025-12-04T15:38:29.7172629Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-d08ca7b1f6355251.xml 2025-12-04T15:38:29.7982270Z Running nn/test_init 1/1 ... [2025-12-04 15:38:29.797766][23181.80698916] 2025-12-04T15:38:29.7982715Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:38:29.7984974Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:29.798094] 2025-12-04T15:38:37.2798835Z 2025-12-04T15:38:37.2799704Z nn/test_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_init_1.1_414026fa8e0e69bb_.log 2025-12-04T15:38:37.2808642Z Running 30 items in this shard: test/nn/test_init.py::TestNNInit::test_calculate_gain_leaky_relu, test/nn/test_init.py::TestNNInit::test_calculate_gain_leaky_relu_only_accepts_numbers, test/nn/test_init.py::TestNNInit::test_calculate_gain_linear, test/nn/test_init.py::TestNNInit::test_calculate_gain_nonlinear, test/nn/test_init.py::TestNNInit::test_calculate_gain_only_accepts_valid_nonlinearities, test/nn/test_init.py::TestNNInit::test_constant, test/nn/test_init.py::TestNNInit::test_deprecation, test/nn/test_init.py::TestNNInit::test_dirac_identity, test/nn/test_init.py::TestNNInit::test_dirac_only_works_on_3_4_5d_inputs, test/nn/test_init.py::TestNNInit::test_dirac_properties, test/nn/test_init.py::TestNNInit::test_eye, test/nn/test_init.py::TestNNInit::test_eye_only_works_on_2d_inputs, test/nn/test_init.py::TestNNInit::test_kaiming_normal, test/nn/test_init.py::TestNNInit::test_kaiming_normal_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_kaiming_normal_warning_on_0element_tensor, test/nn/test_init.py::TestNNInit::test_kaiming_uniform, test/nn/test_init.py::TestNNInit::test_kaiming_uniform_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_kaiming_uniform_warning_on_0element_tensor, test/nn/test_init.py::TestNNInit::test_normal, test/nn/test_init.py::TestNNInit::test_ones_and_zeros, test/nn/test_init.py::TestNNInit::test_orthogonal, test/nn/test_init.py::TestNNInit::test_sparse_default_std, test/nn/test_init.py::TestNNInit::test_sparse_only_works_on_2d_inputs, test/nn/test_init.py::TestNNInit::test_trunc_normal, test/nn/test_init.py::TestNNInit::test_trunc_normal_generator, test/nn/test_init.py::TestNNInit::test_uniform, test/nn/test_init.py::TestNNInit::test_xavier_normal, test/nn/test_init.py::TestNNInit::test_xavier_normal_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_xavier_uniform, test/nn/test_init.py::TestNNInit::test_xavier_uniform_errors_on_inputs_smaller_than_2d 2025-12-04T15:38:37.2816625Z 2025-12-04T15:38:37.2816861Z Finished nn/test_init 1/1 ... [2025-12-04 15:38:37.279566][23189.288790762], took 0.12min 2025-12-04T15:38:37.3068310Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_init/nn.test_init-bb3f84e769cc626f.xml 2025-12-04T15:38:37.3809953Z Running test_mobile_optimizer 1/1 ... [2025-12-04 15:38:37.380572][23189.389795872] 2025-12-04T15:38:37.3810595Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:38:37.3813352Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mobile_optimizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:37.380954] 2025-12-04T15:38:43.1062051Z 2025-12-04T15:38:43.1063005Z test_mobile_optimizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mobile_optimizer_1.1_2406b12c26273884_.log 2025-12-04T15:38:43.1066041Z Running 7 items in this shard: test/test_mobile_optimizer.py::TestOptimizer::test_clone_module_with_class, test/test_mobile_optimizer.py::TestOptimizer::test_generate_mobile_module_lints, test/test_mobile_optimizer.py::TestOptimizer::test_hoist_conv_packed_params, test/test_mobile_optimizer.py::TestOptimizer::test_mobilenet_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_preserve_bundled_inputs_methods, test/test_mobile_optimizer.py::TestOptimizer::test_quantized_conv_no_asan_failures 2025-12-04T15:38:43.1068597Z 2025-12-04T15:38:43.1334824Z Finished test_mobile_optimizer 1/1 ... [2025-12-04 15:38:43.105831][23195.115055521], took 0.10min 2025-12-04T15:38:43.1336921Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-081f0752aeda15ae.xml 2025-12-04T15:38:43.1674996Z Running test_type_promotion 1/1 ... [2025-12-04 15:38:43.167094][23195.176319636] 2025-12-04T15:38:43.1675598Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:38:43.1677474Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_promotion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:43.167401] 2025-12-04T15:38:56.7062658Z 2025-12-04T15:38:56.7064771Z test_type_promotion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_promotion_1.1_a64bbb5536dae6ab_.log 2025-12-04T15:38:56.7244310Z Running 423 items in this shard: test/test_type_promotion.py::TestTypePromotionCUDA::test_add_wrapped_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alpha_mismatch_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alternate_result_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_bfloat16_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_booleans_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_can_cast_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_out_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_comparison_ops_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_assertraises_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_scalar_mult_tensor_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_computation_ignores_out_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_create_bool_tensors_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_float_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_from_issue_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_fail_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_inplace_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_to_float_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_lt_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_many_promotions_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_mixed_type_backward_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_non_promoting_ops_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_self_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_types_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_tensor_vs_scalar_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_add_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_mul_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_sub_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_ternary_out_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_transpose_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unsigned_cuda 2025-12-04T15:38:56.7421474Z 2025-12-04T15:38:56.7421758Z Finished test_type_promotion 1/1 ... [2025-12-04 15:38:56.706711][23208.715935289], took 0.23min 2025-12-04T15:38:56.7422772Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_promotion/test_type_promotion-3f39f26aca555a70.xml 2025-12-04T15:38:58.2318947Z Uploading artifacts took 1.41 seconds 2025-12-04T15:38:58.2322441Z Running test_reductions 1/1 ... [2025-12-04 15:38:58.231933][23210.241156073] 2025-12-04T15:38:58.2322921Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:38:58.2326710Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:58.232331] 2025-12-04T15:41:45.1273942Z 2025-12-04T15:41:45.1274723Z test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_reductions_1.1_4c27d813839f98a0_.log 2025-12-04T15:41:45.3169726Z Running 4759 items in this shard: test/test_reductions.py::TestReductionsCUDA::test_accreal_type_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_all_any_with_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_issue117215_cuda, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_amin_amax_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_axis_with_dim_one_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_large_axis_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_bincount_cuda, test/test_reductions.py::TestReductionsCUDA::test_bucketization_cuda, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_cumprod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_cumsum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_less_than_64_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_value_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_value_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histogram_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogram_error_handling_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogramdd_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_integral_promotion_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_max_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mean_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_int_with_optdtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_corner_cases_cuda, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_max_nan_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_minmax_illegal_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_boolean_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_device_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_numpy_named_args_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_bool_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_quantile_backward_cuda, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_quantile_error_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduce_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_empty_any_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_split_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_repeated_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_scalar_tensor_as_dim_argument_cuda, test/test_reductions.py::TestReductionsCUDA::test_scalar_tensor_dim_compiled_mode_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_cpu_device_mismatch_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_reduction_uint8_overflow_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_out_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_parallel_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_argmax_argmix_kthvalue_dim_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_reduce_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_large_input_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability2_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float64 2025-12-04T15:41:45.5005914Z 2025-12-04T15:41:45.5006190Z Finished test_reductions 1/1 ... [2025-12-04 15:41:45.134949][23377.144173391], took 2.78min 2025-12-04T15:41:45.5007193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_reductions/test_reductions-31a848701d5079bd.xml 2025-12-04T15:41:45.5008289Z Running test_autoload_disable 1/1 ... [2025-12-04 15:41:45.324617][23377.333838102] 2025-12-04T15:41:45.6606079Z Processing /var/lib/jenkins/workspace/test/cpp_extensions 2025-12-04T15:41:48.9058521Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T15:41:48.9079365Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-12-04T15:43:15.1437700Z Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-12-04T15:43:15.1563988Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=13199657 sha256=7c07cad18ea0e6d31f276459cef3d32a7a1ce159eb926509a0f6578be4510701 2025-12-04T15:43:15.1565410Z Stored in directory: /tmp/pip-ephem-wheel-cache-z0r_xujv/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1 2025-12-04T15:43:15.1582836Z Successfully built torch_test_cpp_extension 2025-12-04T15:43:15.5268441Z Installing collected packages: torch_test_cpp_extension 2025-12-04T15:43:15.7424292Z Successfully installed torch_test_cpp_extension-0.0.0 2025-12-04T15:43:18.3791860Z 2025-12-04T15:43:18.3792117Z Running tests... 2025-12-04T15:43:18.3792408Z ---------------------------------------------------------------------- 2025-12-04T15:43:18.7253808Z . 2025-12-04T15:43:18.7254224Z ---------------------------------------------------------------------- 2025-12-04T15:43:18.7254608Z Ran 1 test in 0.346s 2025-12-04T15:43:18.7254756Z 2025-12-04T15:43:18.7254839Z OK 2025-12-04T15:43:18.7254942Z 2025-12-04T15:43:18.7255043Z Generating XML reports... 2025-12-04T15:43:19.4411623Z Finished test_autoload_disable 1/1 ... [2025-12-04 15:43:19.440603][23471.449818604], took 1.57min 2025-12-04T15:43:19.4690874Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204154318.xml 2025-12-04T15:43:23.3789543Z Running test batch 'tests to run' cost 22539.08 seconds 2025-12-04T15:43:23.3803411Z Emitting td_test_failure_stats_v2 2025-12-04T15:43:23.3806508Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f7712824d12711f081bf0242ac110002 2025-12-04T15:43:23.5211262Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f7712824d12711f081bf0242ac110002 2025-12-04T15:43:23.5228134Z Emitting td_test_failure_stats_v2 2025-12-04T15:43:23.5229613Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f786de9ed12711f081bf0242ac110002 2025-12-04T15:43:23.5535525Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f786de9ed12711f081bf0242ac110002 2025-12-04T15:43:23.5545696Z Emitting td_test_failure_stats_v2 2025-12-04T15:43:23.5547934Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f78bb8d8d12711f081bf0242ac110002 2025-12-04T15:43:23.5887436Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764863003_f78bb8d8d12711f081bf0242ac110002 2025-12-04T15:43:23.5888474Z inductor/test_fp8 1/1 failed! 2025-12-04T15:43:23.5888761Z test_cuda 1/1 failed! 2025-12-04T15:43:23.5889083Z test_sparse 1/1 failed! 2025-12-04T15:43:24.3319618Z 2025-12-04T15:43:24.3319929Z real 375m45.282s 2025-12-04T15:43:24.3320205Z user 376m19.018s 2025-12-04T15:43:24.3320414Z sys 36m48.682s 2025-12-04T15:43:24.3320641Z + sccache_epilogue 2025-12-04T15:43:24.3320917Z + echo '::group::Sccache Compilation Log' 2025-12-04T15:43:24.3322037Z ##[group]Sccache Compilation Log 2025-12-04T15:43:24.3322382Z + echo '=================== sccache compilation log ===================' 2025-12-04T15:43:24.3322782Z =================== sccache compilation log =================== 2025-12-04T15:43:24.3323407Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-12-04T15:43:24.3471392Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-12-04T15:43:24.3472082Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-12-04T15:43:24.3472559Z + sccache --show-stats 2025-12-04T15:43:24.3508814Z Compile requests 3479 2025-12-04T15:43:24.3509624Z Compile requests executed 347 2025-12-04T15:43:24.3509974Z Cache hits 166 2025-12-04T15:43:24.3510266Z Cache hits (C/C++) 166 2025-12-04T15:43:24.3510559Z Cache misses 181 2025-12-04T15:43:24.3510856Z Cache misses (C/C++) 181 2025-12-04T15:43:24.3511162Z Cache hits rate 47.84 % 2025-12-04T15:43:24.3511466Z Cache hits rate (C/C++) 47.84 % 2025-12-04T15:43:24.3511777Z Cache timeouts 0 2025-12-04T15:43:24.3512065Z Cache read errors 0 2025-12-04T15:43:24.3512370Z Forced recaches 0 2025-12-04T15:43:24.3512659Z Cache write errors 0 2025-12-04T15:43:24.3512942Z Cache errors 0 2025-12-04T15:43:24.3513232Z Compilations 181 2025-12-04T15:43:24.3513557Z Compilation failures 0 2025-12-04T15:43:24.3513907Z Non-cacheable compilations 0 2025-12-04T15:43:24.3514220Z Non-cacheable calls 173 2025-12-04T15:43:24.3514558Z Non-compilation calls 2959 2025-12-04T15:43:24.3514970Z Unsupported compiler calls 0 2025-12-04T15:43:24.3515415Z Average cache write 0.049 s 2025-12-04T15:43:24.3515780Z Average compiler 5.973 s 2025-12-04T15:43:24.3516115Z Average cache read hit 0.031 s 2025-12-04T15:43:24.3516428Z Failed distributed compilations 0 2025-12-04T15:43:24.3516649Z 2025-12-04T15:43:24.3516748Z Non-cacheable reasons: 2025-12-04T15:43:24.3517024Z unknown source language 138 2025-12-04T15:43:24.3517338Z -E 35 2025-12-04T15:43:24.3517562Z 2025-12-04T15:43:24.3517805Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T15:43:24.3518273Z Version (client) 0.10.0 2025-12-04T15:43:24.3518599Z + sccache --stop-server 2025-12-04T15:43:24.3540038Z Stopping sccache server... 2025-12-04T15:43:24.3543747Z Compile requests 3479 2025-12-04T15:43:24.3544097Z Compile requests executed 347 2025-12-04T15:43:24.3544418Z Cache hits 166 2025-12-04T15:43:24.3544720Z Cache hits (C/C++) 166 2025-12-04T15:43:24.3545016Z Cache misses 181 2025-12-04T15:43:24.3545316Z Cache misses (C/C++) 181 2025-12-04T15:43:24.3545611Z Cache hits rate 47.84 % 2025-12-04T15:43:24.3545921Z Cache hits rate (C/C++) 47.84 % 2025-12-04T15:43:24.3546233Z Cache timeouts 0 2025-12-04T15:43:24.3546518Z Cache read errors 0 2025-12-04T15:43:24.3546813Z Forced recaches 0 2025-12-04T15:43:24.3547105Z Cache write errors 0 2025-12-04T15:43:24.3547389Z Cache errors 0 2025-12-04T15:43:24.3547680Z Compilations 181 2025-12-04T15:43:24.3547986Z Compilation failures 0 2025-12-04T15:43:24.3548370Z Non-cacheable compilations 0 2025-12-04T15:43:24.3548680Z Non-cacheable calls 173 2025-12-04T15:43:24.3548982Z Non-compilation calls 2959 2025-12-04T15:43:24.3549457Z Unsupported compiler calls 0 2025-12-04T15:43:24.3549833Z Average cache write 0.049 s 2025-12-04T15:43:24.3550205Z Average compiler 5.973 s 2025-12-04T15:43:24.3550565Z Average cache read hit 0.031 s 2025-12-04T15:43:24.3550888Z Failed distributed compilations 0 2025-12-04T15:43:24.3551219Z 2025-12-04T15:43:24.3551314Z Non-cacheable reasons: 2025-12-04T15:43:24.3551574Z unknown source language 138 2025-12-04T15:43:24.3551866Z -E 35 2025-12-04T15:43:24.3552070Z 2025-12-04T15:43:24.3552303Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T15:43:24.3552748Z Version (client) 0.10.0 2025-12-04T15:43:24.3561261Z + echo ::endgroup:: 2025-12-04T15:43:24.3561900Z ##[endgroup] 2025-12-04T15:43:24.3562122Z + cleanup_workspace 2025-12-04T15:43:24.3562617Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.' 2025-12-04T15:43:24.3563402Z sudo may print the following warning message that can be ignored. The chown command will still run. 2025-12-04T15:43:24.3564030Z + echo ' sudo: setrlimit(RLIMIT_STACK): Operation not permitted' 2025-12-04T15:43:24.3564495Z sudo: setrlimit(RLIMIT_STACK): Operation not permitted 2025-12-04T15:43:24.3565079Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42' 2025-12-04T15:43:24.3565738Z For more details refer to https://github.com/sudo-project/sudo/issues/42 2025-12-04T15:43:24.3566294Z + sudo chown -R 1000 /var/lib/jenkins/workspace 2025-12-04T15:43:25.4614688Z ##[error]Process completed with exit code 1. 2025-12-04T15:43:25.4687171Z Prepare all required actions 2025-12-04T15:43:25.4687540Z Getting action download info 2025-12-04T15:43:25.6354105Z ##[group]Run ./.github/actions/pytest-cache-upload 2025-12-04T15:43:25.6354433Z with: 2025-12-04T15:43:25.6354637Z cache_dir: .pytest_cache 2025-12-04T15:43:25.6354889Z shard: 2 2025-12-04T15:43:25.6355121Z sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T15:43:25.6355443Z test_config: default 2025-12-04T15:43:25.6355828Z job_identifier: periodic_linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck 2025-12-04T15:43:25.6356292Z env: 2025-12-04T15:43:25.6356496Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:25.6356757Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:25.6357056Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:25.6357611Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:25.6358114Z ##[endgroup] 2025-12-04T15:43:25.6391618Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T15:43:25.6391937Z with: 2025-12-04T15:43:25.6392136Z shell: bash 2025-12-04T15:43:25.6392341Z timeout_minutes: 5 2025-12-04T15:43:25.6392579Z max_attempts: 5 2025-12-04T15:43:25.6392804Z retry_wait_seconds: 30 2025-12-04T15:43:25.6393126Z command: set -eu python3 -m pip install boto3==1.35.42 2025-12-04T15:43:25.6393512Z polling_interval_seconds: 1 2025-12-04T15:43:25.6393783Z warning_on_retry: true 2025-12-04T15:43:25.6394032Z continue_on_error: false 2025-12-04T15:43:25.6394277Z env: 2025-12-04T15:43:25.6394480Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:25.6394727Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:25.6395030Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:25.6395597Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:25.6396098Z ##[endgroup] 2025-12-04T15:43:26.1264914Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T15:43:27.4035499Z Collecting boto3==1.35.42 2025-12-04T15:43:27.4446770Z Downloading boto3-1.35.42-py3-none-any.whl (139 kB) 2025-12-04T15:43:27.4607346Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0) 2025-12-04T15:43:28.8282527Z Collecting botocore<1.36.0,>=1.35.42 2025-12-04T15:43:28.8320434Z Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB) 2025-12-04T15:43:29.0251330Z Collecting s3transfer<0.11.0,>=0.10.0 2025-12-04T15:43:29.0288694Z Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB) 2025-12-04T15:43:29.0382979Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1) 2025-12-04T15:43:29.0392177Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10) 2025-12-04T15:43:29.2523763Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0) 2025-12-04T15:43:29.3442818Z Installing collected packages: botocore, s3transfer, boto3 2025-12-04T15:43:29.9688403Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4 2025-12-04T15:43:30.7218914Z Command completed after 1 attempt(s). 2025-12-04T15:43:30.7294938Z ##[group]Run python3 .github/scripts/pytest_cache.py \ 2025-12-04T15:43:30.7306934Z python3 .github/scripts/pytest_cache.py \ 2025-12-04T15:43:30.7307290Z  --upload \ 2025-12-04T15:43:30.7307584Z  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \ 2025-12-04T15:43:30.7308288Z  --pr_identifier "$GITHUB_REF" \ 2025-12-04T15:43:30.7308624Z  --job_identifier "$JOB_IDENTIFIER" \ 2025-12-04T15:43:30.7308936Z  --sha "$SHA" \ 2025-12-04T15:43:30.7309208Z  --test_config "$TEST_CONFIG" \ 2025-12-04T15:43:30.7309508Z  --shard "$SHARD" \ 2025-12-04T15:43:30.7310001Z  --repo "$REPO" \ 2025-12-04T15:43:30.7310291Z  --temp_dir "$RUNNER_TEMP" \ 2025-12-04T15:43:30.7325283Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:30.7325655Z env: 2025-12-04T15:43:30.7325862Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:30.7326117Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:30.7326425Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:30.7326988Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:30.7327500Z CACHE_DIR: .pytest_cache 2025-12-04T15:43:30.7327895Z JOB_IDENTIFIER: periodic_linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck 2025-12-04T15:43:30.7328381Z SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T15:43:30.7328708Z TEST_CONFIG: default 2025-12-04T15:43:30.7328931Z SHARD: 2 2025-12-04T15:43:30.7329139Z REPO: pytorch/pytorch 2025-12-04T15:43:30.7329381Z ##[endgroup] 2025-12-04T15:43:31.1444822Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013` 2025-12-04T15:43:31.1446948Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='periodic_linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='default', shard='2', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None) 2025-12-04T15:43:31.1449083Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache 2025-12-04T15:43:31.1450427Z to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_8-py3-gcc11-slow-gradcheck/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2 2025-12-04T15:43:31.1452575Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_8-py3-gcc11-slow-gradcheck/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2.zip 2025-12-04T15:43:31.1454616Z to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_8-py3-gcc11-slow-gradcheck/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2.zip 2025-12-04T15:43:31.2032340Z ##[group]Run cat test/**/*_toprint.log || true 2025-12-04T15:43:31.2032736Z cat test/**/*_toprint.log || true 2025-12-04T15:43:31.2041969Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:31.2042333Z env: 2025-12-04T15:43:31.2042529Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:31.2042805Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:31.2043349Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:31.2043940Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:31.2044434Z ##[endgroup] 2025-12-04T15:43:31.2156707Z cat: 'test/**/*_toprint.log': No such file or directory 2025-12-04T15:43:31.2187972Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2025-12-04T15:43:31.2188329Z kill "$MONITOR_SCRIPT_PID" 2025-12-04T15:43:31.2196711Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:31.2197076Z env: 2025-12-04T15:43:31.2197278Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:31.2197535Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:31.2197835Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:31.2198390Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:31.2198897Z MONITOR_SCRIPT_PID: 59410 2025-12-04T15:43:31.2199148Z ##[endgroup] 2025-12-04T15:43:31.2230138Z /home/ec2-user/actions-runner/_work/_temp/6b52e012-4c76-4fc2-a68d-eb54305df0ff.sh: line 1: kill: (59410) - No such process 2025-12-04T15:43:31.2234060Z ##[error]Process completed with exit code 1. 2025-12-04T15:43:31.2362511Z Prepare all required actions 2025-12-04T15:43:31.2362906Z Getting action download info 2025-12-04T15:43:31.4133693Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T15:43:31.6500504Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T15:43:32.1650666Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-12-04T15:43:32.1651020Z with: 2025-12-04T15:43:32.1651367Z file-suffix: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212 2025-12-04T15:43:32.1651821Z s3-bucket: gha-artifacts 2025-12-04T15:43:32.1652071Z env: 2025-12-04T15:43:32.1652262Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:32.1652518Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:32.1652833Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:32.1653384Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:32.1653919Z ##[endgroup] 2025-12-04T15:43:32.1696633Z ##[group]Run # Remove any previous test jsons if they exist 2025-12-04T15:43:32.1697100Z # Remove any previous test jsons if they exist 2025-12-04T15:43:32.1697474Z rm -f test-jsons-*.zip 2025-12-04T15:43:32.1697904Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json' 2025-12-04T15:43:32.1707339Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:32.1707708Z env: 2025-12-04T15:43:32.1708260Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:32.1708553Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:32.1708856Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:32.1709404Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:32.1710047Z FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212 2025-12-04T15:43:32.1710469Z ##[endgroup] 2025-12-04T15:43:32.1937561Z adding: test/test-reports/td_exclusions-8f4b859dc7ee5c40b00d.json (deflated 82%) 2025-12-04T15:43:32.1947921Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d2163ec8f4306bf7.json (deflated 94%) 2025-12-04T15:43:32.1978483Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-7dfb99a0e36ebc6b.json (deflated 94%) 2025-12-04T15:43:32.1983920Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f45bd9366a90530e.json (deflated 96%) 2025-12-04T15:43:32.1990221Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-85306c1f70284b1c.json (deflated 96%) 2025-12-04T15:43:32.2006994Z adding: test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-e8dc2e2d2922989b.json (deflated 94%) 2025-12-04T15:43:32.2009020Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.json (deflated 88%) 2025-12-04T15:43:32.2011161Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.json (deflated 88%) 2025-12-04T15:43:32.2012803Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.json (deflated 88%) 2025-12-04T15:43:32.2014878Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.json (deflated 88%) 2025-12-04T15:43:32.2016550Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.json (deflated 88%) 2025-12-04T15:43:32.2018717Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.json (deflated 88%) 2025-12-04T15:43:32.2020493Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.json (deflated 88%) 2025-12-04T15:43:32.2022814Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.json (deflated 88%) 2025-12-04T15:43:32.2024182Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.json (deflated 88%) 2025-12-04T15:43:32.2026303Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.json (deflated 88%) 2025-12-04T15:43:32.2027952Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.json (deflated 88%) 2025-12-04T15:43:32.2030022Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.json (deflated 88%) 2025-12-04T15:43:32.2031667Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.json (deflated 88%) 2025-12-04T15:43:32.2033772Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.json (deflated 88%) 2025-12-04T15:43:32.2035413Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.json (deflated 88%) 2025-12-04T15:43:32.2038319Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.json (deflated 91%) 2025-12-04T15:43:32.2039951Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.json (deflated 88%) 2025-12-04T15:43:32.2042066Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.json (deflated 88%) 2025-12-04T15:43:32.2043788Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.json (deflated 88%) 2025-12-04T15:43:32.2045768Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.json (deflated 88%) 2025-12-04T15:43:32.2047399Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.json (deflated 88%) 2025-12-04T15:43:32.2049460Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.json (deflated 88%) 2025-12-04T15:43:32.2051085Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.json (deflated 88%) 2025-12-04T15:43:32.2053088Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.json (deflated 88%) 2025-12-04T15:43:32.2054736Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.json (deflated 88%) 2025-12-04T15:43:32.2056754Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.json (deflated 88%) 2025-12-04T15:43:32.2058413Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.json (deflated 88%) 2025-12-04T15:43:32.2060480Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.json (deflated 88%) 2025-12-04T15:43:32.2062106Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.json (deflated 88%) 2025-12-04T15:43:32.2064105Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.json (deflated 88%) 2025-12-04T15:43:32.2066843Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.json (deflated 91%) 2025-12-04T15:43:32.2069458Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.json (deflated 90%) 2025-12-04T15:43:32.2071993Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.json (deflated 90%) 2025-12-04T15:43:32.2073695Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.json (deflated 88%) 2025-12-04T15:43:32.2076404Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.json (deflated 90%) 2025-12-04T15:43:32.2079020Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.json (deflated 90%) 2025-12-04T15:43:32.2081288Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.json (deflated 89%) 2025-12-04T15:43:32.2082850Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.json (deflated 88%) 2025-12-04T15:43:32.2085649Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.json (deflated 88%) 2025-12-04T15:43:32.2087362Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.json (deflated 88%) 2025-12-04T15:43:32.2089485Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.json (deflated 88%) 2025-12-04T15:43:32.2091180Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.json (deflated 88%) 2025-12-04T15:43:32.2093264Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.json (deflated 88%) 2025-12-04T15:43:32.2095196Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.json (deflated 88%) 2025-12-04T15:43:32.2096858Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.json (deflated 88%) 2025-12-04T15:43:32.2098926Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.json (deflated 88%) 2025-12-04T15:43:32.2100922Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.json (deflated 88%) 2025-12-04T15:43:32.2102631Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.json (deflated 88%) 2025-12-04T15:43:32.2104692Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.json (deflated 88%) 2025-12-04T15:43:32.2106389Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.json (deflated 88%) 2025-12-04T15:43:32.2109259Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.json (deflated 88%) 2025-12-04T15:43:32.2110904Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.json (deflated 88%) 2025-12-04T15:43:32.2112523Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.json (deflated 88%) 2025-12-04T15:43:32.2114585Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.json (deflated 88%) 2025-12-04T15:43:32.2116320Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.json (deflated 88%) 2025-12-04T15:43:32.2118340Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.json (deflated 88%) 2025-12-04T15:43:32.2120051Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.json (deflated 88%) 2025-12-04T15:43:32.2122086Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.json (deflated 88%) 2025-12-04T15:43:32.2123883Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.json (deflated 88%) 2025-12-04T15:43:32.2125761Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.json (deflated 88%) 2025-12-04T15:43:32.2127663Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.json (deflated 88%) 2025-12-04T15:43:32.2129559Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.json (deflated 88%) 2025-12-04T15:43:32.2131473Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.json (deflated 88%) 2025-12-04T15:43:32.2133325Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.json (deflated 88%) 2025-12-04T15:43:32.2135391Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.json (deflated 88%) 2025-12-04T15:43:32.2136899Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.json (deflated 88%) 2025-12-04T15:43:32.2140228Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.json (deflated 92%) 2025-12-04T15:43:32.2141897Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.json (deflated 88%) 2025-12-04T15:43:32.2143892Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.json (deflated 88%) 2025-12-04T15:43:32.2145563Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.json (deflated 88%) 2025-12-04T15:43:32.2147506Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.json (deflated 88%) 2025-12-04T15:43:32.2149147Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.json (deflated 88%) 2025-12-04T15:43:32.2151691Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.json (deflated 90%) 2025-12-04T15:43:32.2153374Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.json (deflated 88%) 2025-12-04T15:43:32.2155337Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.json (deflated 88%) 2025-12-04T15:43:32.2157011Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.json (deflated 88%) 2025-12-04T15:43:32.2158969Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.json (deflated 88%) 2025-12-04T15:43:32.2160581Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.json (deflated 88%) 2025-12-04T15:43:32.2163115Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.json (deflated 90%) 2025-12-04T15:43:32.2164760Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.json (deflated 88%) 2025-12-04T15:43:32.2166699Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.json (deflated 88%) 2025-12-04T15:43:32.2168320Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.json (deflated 88%) 2025-12-04T15:43:32.2170277Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.json (deflated 88%) 2025-12-04T15:43:32.2171897Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.json (deflated 88%) 2025-12-04T15:43:32.2174997Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.json (deflated 91%) 2025-12-04T15:43:32.2176993Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.json (deflated 89%) 2025-12-04T15:43:32.2178901Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.json (deflated 89%) 2025-12-04T15:43:32.2180943Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.json (deflated 89%) 2025-12-04T15:43:32.2182871Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.json (deflated 89%) 2025-12-04T15:43:32.2184823Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.json (deflated 89%) 2025-12-04T15:43:32.2186731Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.json (deflated 89%) 2025-12-04T15:43:32.2188680Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.json (deflated 89%) 2025-12-04T15:43:32.2190569Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.json (deflated 89%) 2025-12-04T15:43:32.2192620Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.json (deflated 89%) 2025-12-04T15:43:32.2194293Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.json (deflated 89%) 2025-12-04T15:43:32.2196361Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.json (deflated 89%) 2025-12-04T15:43:32.2198254Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.json (deflated 89%) 2025-12-04T15:43:32.2200197Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.json (deflated 89%) 2025-12-04T15:43:32.2202101Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.json (deflated 89%) 2025-12-04T15:43:32.2204081Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.json (deflated 89%) 2025-12-04T15:43:32.2205990Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.json (deflated 89%) 2025-12-04T15:43:32.2208213Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.json (deflated 89%) 2025-12-04T15:43:32.2210643Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.json (deflated 88%) 2025-12-04T15:43:32.2212560Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.json (deflated 88%) 2025-12-04T15:43:32.2214123Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.json (deflated 88%) 2025-12-04T15:43:32.2218681Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.json (deflated 98%) 2025-12-04T15:43:32.2219868Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-fcf8b9b0a2e7a178.json (deflated 93%) 2025-12-04T15:43:32.2239955Z adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-cc2491bbd877af9c.json (deflated 95%) 2025-12-04T15:43:32.2244610Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-66246eed1b64fd5c.json (deflated 89%) 2025-12-04T15:43:32.2339845Z adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-38411ac3079c7061.json (deflated 95%) 2025-12-04T15:43:32.2343364Z adding: test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-ca7327bb8f17c961.json (deflated 84%) 2025-12-04T15:43:32.2346379Z adding: test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-b7f63c3b423acf1d.json (deflated 91%) 2025-12-04T15:43:32.2348304Z adding: test/test-reports/python-pytest/dynamo.test_callback/dynamo.test_callback-6c0ee54264bcedf0.json (deflated 82%) 2025-12-04T15:43:32.2349572Z adding: test/test-reports/python-pytest/inductor.test_custom_op_autotune/inductor.test_custom_op_autotune-8f7d8d00cc13374f.json (deflated 80%) 2025-12-04T15:43:32.2353560Z adding: test/test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.json (deflated 90%) 2025-12-04T15:43:32.2382850Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.json (deflated 97%) 2025-12-04T15:43:32.2383939Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.json (deflated 91%) 2025-12-04T15:43:32.2385195Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.json (deflated 91%) 2025-12-04T15:43:32.2386493Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.json (deflated 91%) 2025-12-04T15:43:32.2387855Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.json (deflated 91%) 2025-12-04T15:43:32.2389443Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.json (deflated 91%) 2025-12-04T15:43:32.2415355Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.json (deflated 97%) 2025-12-04T15:43:32.2422886Z adding: test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-95ccd07868721469.json (deflated 95%) 2025-12-04T15:43:32.2437838Z adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-1e96fc6cc9093b07.json (deflated 96%) 2025-12-04T15:43:32.2453997Z adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-91f289dc18834c3e.json (deflated 96%) 2025-12-04T15:43:32.2500073Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-05b5b699aba88456.json (deflated 95%) 2025-12-04T15:43:32.2501145Z adding: test/test-reports/python-pytest/dynamo.test_after_aot/dynamo.test_after_aot-138e4478191117d7.json (deflated 59%) 2025-12-04T15:43:32.2503972Z adding: test/test-reports/python-pytest/inductor.test_snode_runtime/inductor.test_snode_runtime-f1ec066e866be26d.json (deflated 92%) 2025-12-04T15:43:32.2543869Z adding: test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-bf57fb8d20e32a72.json (deflated 93%) 2025-12-04T15:43:32.2578562Z adding: test/test-reports/python-pytest/test_testing/test_testing-4c4caba52af0adff.json (deflated 97%) 2025-12-04T15:43:32.2579761Z adding: test/test-reports/python-pytest/inductor.test_autoheuristic/inductor.test_autoheuristic-10f7d7896ce04bc8.json (stored 0%) 2025-12-04T15:43:32.2580949Z adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-c4d4e9aba2280ad9.json (deflated 92%) 2025-12-04T15:43:32.2582254Z adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-8a04be886b6d69cf.json (deflated 82%) 2025-12-04T15:43:32.2583524Z adding: test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-c7e05865cddca77f.json (deflated 74%) 2025-12-04T15:43:32.2584915Z adding: test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6d20a7277844030b.json (deflated 74%) 2025-12-04T15:43:32.2586309Z adding: test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-6a2d2929a87aa7f5.json (deflated 83%) 2025-12-04T15:43:32.2587705Z adding: test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-b498ae4cc20525c9.json (deflated 70%) 2025-12-04T15:43:32.2589102Z adding: test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-4c5fe50d62df582d.json (deflated 62%) 2025-12-04T15:43:32.2590233Z adding: test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-179ecdae5d21ef0e.json (deflated 66%) 2025-12-04T15:43:32.2591446Z adding: test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-bacbff1a865ff8bb.json (deflated 62%) 2025-12-04T15:43:32.2592657Z adding: test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-e71c26709471ff2e.json (deflated 51%) 2025-12-04T15:43:32.2593863Z adding: test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-45ff8ae422230f99.json (deflated 90%) 2025-12-04T15:43:32.2595116Z adding: test/test-reports/python-pytest/inductor.test_provenance_tracing/inductor.test_provenance_tracing-6455ccf06df051be.json (deflated 87%) 2025-12-04T15:43:32.2596324Z adding: test/test-reports/python-pytest/inductor.test_memory_planning/inductor.test_memory_planning-d9b25b367275156e.json (deflated 71%) 2025-12-04T15:43:32.2638887Z adding: test/test-reports/python-pytest/export.test_cpp_serdes/export.test_cpp_serdes-72e11f38870e0d13.json (deflated 96%) 2025-12-04T15:43:32.2657839Z adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5ad0fee917746162.json (deflated 97%) 2025-12-04T15:43:32.2660697Z adding: test/test-reports/python-pytest/test_sort_and_select/test_sort_and_select-049427debff60b53.json (deflated 95%) 2025-12-04T15:43:32.2661802Z adding: test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-cccd30d217a8d074.json (deflated 88%) 2025-12-04T15:43:32.2664460Z adding: test/test-reports/python-pytest/test_package/test_package-a2f65f799bf50b4a.json (deflated 93%) 2025-12-04T15:43:32.2665439Z adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-c19a0c4320bf6e65.json (deflated 64%) 2025-12-04T15:43:32.2666623Z adding: test/test-reports/python-pytest/test_utils_config_module/test_utils_config_module-cd73bdff208ab311.json (deflated 90%) 2025-12-04T15:43:32.2667655Z adding: test/test-reports/python-pytest/test_hop_infra/test_hop_infra-d1efcb546b726ee3.json (deflated 72%) 2025-12-04T15:43:32.2668735Z adding: test/test-reports/python-pytest/test_appending_byte_serializer/test_appending_byte_serializer-db1af3fc87bd6240.json (deflated 76%) 2025-12-04T15:43:32.2670211Z adding: test/test-reports/python-pytest/test_ao_sparsity/test_ao_sparsity-47b60e8cb29a5ef6.json (deflated 91%) 2025-12-04T15:43:32.2671206Z adding: test/test-reports/python-pytest/test_extension_utils/test_extension_utils-5e3baa267a09a3bb.json (deflated 64%) 2025-12-04T15:43:32.2672460Z adding: test/test-reports/python-pytest/nn.attention.test_fa4/nn.attention.test_fa4-2d55ad78ccee943a.json (deflated 98%) 2025-12-04T15:43:32.2678524Z adding: test/test-reports/python-pytest/typing.test_python_operators/typing.test_python_operators-7b01e9f4c56696ce.json (deflated 98%) 2025-12-04T15:43:32.2679611Z adding: test/test-reports/python-pytest/torch_np.test_dtype/torch_np.test_dtype-50c590a3e827391c.json (deflated 96%) 2025-12-04T15:43:32.2680544Z adding: test/test-reports/python-pytest/test_file_check/test_file_check-c5f916d4f839abe2.json (deflated 61%) 2025-12-04T15:43:32.2681513Z adding: test/test-reports/python-pytest/profiler.test_kineto/profiler.test_kineto-1437f02ea71dbd19.json (deflated 37%) 2025-12-04T15:43:32.2682596Z adding: test/test-reports/python-pytest/functorch.test_ac_knapsack/functorch.test_ac_knapsack-a2f3dae1f99bc885.json (deflated 87%) 2025-12-04T15:43:32.2717066Z adding: test/test-reports/python-pytest/torch_np.test_nep50_examples/torch_np.test_nep50_examples-87e42828c2fde829.json (deflated 99%) 2025-12-04T15:43:32.2738870Z adding: test/test-reports/python-pytest/test_torch/test_torch-6322eeaa434bd119.json (deflated 95%) 2025-12-04T15:43:32.2739766Z adding: test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-6cf9ed264c8fa189.json (stored 0%) 2025-12-04T15:43:32.2944934Z adding: test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-510898c7a9dfb9c9.json (deflated 98%) 2025-12-04T15:43:32.2963216Z adding: test/test-reports/python-pytest/test_modules/test_modules-1ceed37f0450876d.json (deflated 96%) 2025-12-04T15:43:32.2969484Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-320a7bc7a2da135c.json (deflated 97%) 2025-12-04T15:43:32.2972868Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-9c6a851d43187f63.json (deflated 97%) 2025-12-04T15:43:32.2973994Z adding: test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-612fe6974f2e86fb.json (deflated 33%) 2025-12-04T15:43:32.2975000Z adding: test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-573eaa6de6818c33.json (deflated 94%) 2025-12-04T15:43:32.2975961Z adding: test/test-reports/python-pytest/test_shape_ops/test_shape_ops-8ae5e584fb53bb5e.json (deflated 96%) 2025-12-04T15:43:32.2977369Z adding: test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-419c9aea1e4e06f2.json (deflated 87%) 2025-12-04T15:43:32.2981569Z adding: test/test-reports/python-pytest/test_indexing/test_indexing-bb3db4f55bab2e87.json (deflated 95%) 2025-12-04T15:43:32.2982671Z adding: test/test-reports/python-pytest/test_type_info/test_type_info-3cbecfd6afe8711f.json (deflated 83%) 2025-12-04T15:43:32.3002128Z adding: test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-3265775c77799c99.json (deflated 95%) 2025-12-04T15:43:32.3003708Z adding: test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5e8dbe55d5e60a97.json (deflated 95%) 2025-12-04T15:43:32.3006237Z adding: test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-339f2b8a0ba2c562.json (deflated 94%) 2025-12-04T15:43:32.3008000Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-7a9eb44e36e96ef2.json (deflated 95%) 2025-12-04T15:43:32.3009980Z adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-8a1338a601c4ef0b.json (deflated 91%) 2025-12-04T15:43:32.3011008Z adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-d08ca7b1f6355251.json (deflated 88%) 2025-12-04T15:43:32.3011983Z adding: test/test-reports/python-pytest/nn.test_init/nn.test_init-bb3f84e769cc626f.json (deflated 91%) 2025-12-04T15:43:32.3012939Z adding: test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-081f0752aeda15ae.json (deflated 83%) 2025-12-04T15:43:32.3021435Z adding: test/test-reports/python-pytest/test_type_promotion/test_type_promotion-3f39f26aca555a70.json (deflated 98%) 2025-12-04T15:43:32.3094715Z adding: test/test-reports/python-pytest/test_reductions/test_reductions-31a848701d5079bd.json (deflated 98%) 2025-12-04T15:43:32.3095747Z adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204154318.json (deflated 38%) 2025-12-04T15:43:32.3125878Z ##[group]Run # Remove any previous test reports if they exist 2025-12-04T15:43:32.3126365Z # Remove any previous test reports if they exist 2025-12-04T15:43:32.3126771Z rm -f test-reports-*.zip 2025-12-04T15:43:32.3127265Z zip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv' 2025-12-04T15:43:32.3136520Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:32.3136899Z env: 2025-12-04T15:43:32.3137114Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:32.3137379Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:32.3137696Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:32.3138275Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:32.3139034Z FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212 2025-12-04T15:43:32.3139611Z ##[endgroup] 2025-12-04T15:43:32.3287611Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d2163ec8f4306bf7.xml (deflated 93%) 2025-12-04T15:43:32.3312706Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-7dfb99a0e36ebc6b.xml (deflated 93%) 2025-12-04T15:43:32.3317191Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f45bd9366a90530e.xml (deflated 92%) 2025-12-04T15:43:32.3322051Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-85306c1f70284b1c.xml (deflated 93%) 2025-12-04T15:43:32.3337627Z adding: test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-e8dc2e2d2922989b.xml (deflated 94%) 2025-12-04T15:43:32.3339407Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db644bb4b324bdb7.xml (deflated 88%) 2025-12-04T15:43:32.3341706Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e914e8d16d69105.xml (deflated 88%) 2025-12-04T15:43:32.3343333Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-50bc30595f88ffc2.xml (deflated 88%) 2025-12-04T15:43:32.3345705Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d884b8d5e3e94e48.xml (deflated 88%) 2025-12-04T15:43:32.3347155Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a69bb89413c2540.xml (deflated 88%) 2025-12-04T15:43:32.3349577Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7faecec052419cbd.xml (deflated 88%) 2025-12-04T15:43:32.3351120Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6010875b22bc8ac8.xml (deflated 88%) 2025-12-04T15:43:32.3353294Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-620e7f1dee165307.xml (deflated 88%) 2025-12-04T15:43:32.3355328Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dca78723954b543e.xml (deflated 88%) 2025-12-04T15:43:32.3357378Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47baa69476236f0d.xml (deflated 88%) 2025-12-04T15:43:32.3359335Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-65e1914bc5a98b68.xml (deflated 88%) 2025-12-04T15:43:32.3361191Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c5bb515d6d359bff.xml (deflated 88%) 2025-12-04T15:43:32.3363111Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c1e30c86a333739.xml (deflated 88%) 2025-12-04T15:43:32.3365336Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b90d6b6da610445.xml (deflated 88%) 2025-12-04T15:43:32.3366942Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a072eaf7c7952381.xml (deflated 88%) 2025-12-04T15:43:32.3370057Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed057c1fcbb94799.xml (deflated 91%) 2025-12-04T15:43:32.3371665Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-968a15629836e8e5.xml (deflated 88%) 2025-12-04T15:43:32.3373933Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-18435a001689398e.xml (deflated 88%) 2025-12-04T15:43:32.3375518Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1fb9f6c6a48b6e6d.xml (deflated 88%) 2025-12-04T15:43:32.3377625Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-065446534c616beb.xml (deflated 88%) 2025-12-04T15:43:32.3379395Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1d2a8ce3b4b5886.xml (deflated 88%) 2025-12-04T15:43:32.3381480Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-674b8dc9404ae6b8.xml (deflated 88%) 2025-12-04T15:43:32.3383183Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cc9561c0d8657b8b.xml (deflated 88%) 2025-12-04T15:43:32.3385282Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-404b9c4e952131ee.xml (deflated 88%) 2025-12-04T15:43:32.3386980Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20c0b66776858372.xml (deflated 88%) 2025-12-04T15:43:32.3389115Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9d5c83fadba9e9ce.xml (deflated 88%) 2025-12-04T15:43:32.3390775Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1dab991d3317abd8.xml (deflated 88%) 2025-12-04T15:43:32.3392937Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f085652b7427a496.xml (deflated 88%) 2025-12-04T15:43:32.3394860Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95ee7103f62e55b.xml (deflated 88%) 2025-12-04T15:43:32.3396572Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-654154445c092fde.xml (deflated 88%) 2025-12-04T15:43:32.3399633Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ba0dc36db419dab.xml (deflated 90%) 2025-12-04T15:43:32.3402364Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43b9de002e57b201.xml (deflated 90%) 2025-12-04T15:43:32.3405250Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72bbb4ffa50a3b8f.xml (deflated 90%) 2025-12-04T15:43:32.3406909Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dc95e06403bf2cf4.xml (deflated 88%) 2025-12-04T15:43:32.3410375Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2ecda6633877c191.xml (deflated 90%) 2025-12-04T15:43:32.3412918Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-94525e2604bd2c48.xml (deflated 90%) 2025-12-04T15:43:32.3415143Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cccfa58813c47b76.xml (deflated 88%) 2025-12-04T15:43:32.3417139Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b15afc0b67ce9f18.xml (deflated 88%) 2025-12-04T15:43:32.3419048Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf4a147c19ee9f9e.xml (deflated 88%) 2025-12-04T15:43:32.3421932Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2bc800e39b37121b.xml (deflated 88%) 2025-12-04T15:43:32.3423835Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-72539e4fc7965791.xml (deflated 88%) 2025-12-04T15:43:32.3426191Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-608bb4821bf56951.xml (deflated 88%) 2025-12-04T15:43:32.3427702Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85b266071ff03d8e.xml (deflated 88%) 2025-12-04T15:43:32.3429884Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a17686aeac45c48.xml (deflated 88%) 2025-12-04T15:43:32.3431735Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ff45f371a68a266.xml (deflated 88%) 2025-12-04T15:43:32.3433864Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8a4bd414fb0c5364.xml (deflated 88%) 2025-12-04T15:43:32.3435852Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7328c547076beb1d.xml (deflated 88%) 2025-12-04T15:43:32.3437852Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-596b2e62c7fac124.xml (deflated 88%) 2025-12-04T15:43:32.3439836Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf542a7c0dc43236.xml (deflated 88%) 2025-12-04T15:43:32.3441821Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-546aec25444a8171.xml (deflated 88%) 2025-12-04T15:43:32.3443584Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ace0b4cd173725a5.xml (deflated 88%) 2025-12-04T15:43:32.3445752Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407f954f9fdbe9a2.xml (deflated 88%) 2025-12-04T15:43:32.3447477Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f07bf6809a00c18.xml (deflated 88%) 2025-12-04T15:43:32.3449626Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43c2760ba44b88e1.xml (deflated 88%) 2025-12-04T15:43:32.3451480Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2733070510d0d7a0.xml (deflated 88%) 2025-12-04T15:43:32.3453556Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c25d1f42247a8b43.xml (deflated 88%) 2025-12-04T15:43:32.3455536Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b96432f0a6e31e1a.xml (deflated 88%) 2025-12-04T15:43:32.3457517Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-05a266b06f355d9d.xml (deflated 88%) 2025-12-04T15:43:32.3459532Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-407c474e66b2be17.xml (deflated 88%) 2025-12-04T15:43:32.3461403Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a744f37671c4da2.xml (deflated 88%) 2025-12-04T15:43:32.3463529Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db9087acaf81b17b.xml (deflated 88%) 2025-12-04T15:43:32.3465580Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09dd3cb88118f907.xml (deflated 88%) 2025-12-04T15:43:32.3467290Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dab12f6d7c9d4445.xml (deflated 88%) 2025-12-04T15:43:32.3469389Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f8604402cca2c77a.xml (deflated 88%) 2025-12-04T15:43:32.3471265Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ab50a9e09cebe56.xml (deflated 88%) 2025-12-04T15:43:32.3473273Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c7795e09b597066.xml (deflated 88%) 2025-12-04T15:43:32.3476459Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-635ebbc043d5848c.xml (deflated 91%) 2025-12-04T15:43:32.3478187Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0323881f8d7298c5.xml (deflated 88%) 2025-12-04T15:43:32.3480329Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5a621381cede67e.xml (deflated 88%) 2025-12-04T15:43:32.3481995Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f968ba007e1cde.xml (deflated 88%) 2025-12-04T15:43:32.3484090Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6692c2260aa4878d.xml (deflated 88%) 2025-12-04T15:43:32.3485756Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a3011ae1354516e.xml (deflated 88%) 2025-12-04T15:43:32.3488493Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3611a345e009b2bb.xml (deflated 89%) 2025-12-04T15:43:32.3490341Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0db87822cd439ac7.xml (deflated 88%) 2025-12-04T15:43:32.3492327Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-91f5694b57c1a92f.xml (deflated 88%) 2025-12-04T15:43:32.3494232Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-656b2d48eeee2845.xml (deflated 88%) 2025-12-04T15:43:32.3496119Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b83921f3a8032b56.xml (deflated 88%) 2025-12-04T15:43:32.3498106Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-460ba7ed6dfd0606.xml (deflated 88%) 2025-12-04T15:43:32.3500745Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c79e71370420be79.xml (deflated 89%) 2025-12-04T15:43:32.3502458Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f3a26f5fe94e08e.xml (deflated 88%) 2025-12-04T15:43:32.3504502Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c932c7001b17602.xml (deflated 88%) 2025-12-04T15:43:32.3506191Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4eb2347c5bf53650.xml (deflated 88%) 2025-12-04T15:43:32.3508434Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51cd51dacf8933cc.xml (deflated 88%) 2025-12-04T15:43:32.3511103Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-31bbe61564cb1155.xml (deflated 88%) 2025-12-04T15:43:32.3514335Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2018c4297ae5d1b6.xml (deflated 90%) 2025-12-04T15:43:32.3516456Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a4cfec006d83414c.xml (deflated 88%) 2025-12-04T15:43:32.3518533Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c772b3f4ee0ea2b.xml (deflated 88%) 2025-12-04T15:43:32.3520602Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e35a61438b1031b2.xml (deflated 88%) 2025-12-04T15:43:32.3522706Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-aa66aae00d9f4032.xml (deflated 88%) 2025-12-04T15:43:32.3524939Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-77f4025f4a501300.xml (deflated 88%) 2025-12-04T15:43:32.3526867Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c7d0e605b4f61b13.xml (deflated 88%) 2025-12-04T15:43:32.3528912Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-22a336ab3d10260b.xml (deflated 88%) 2025-12-04T15:43:32.3530982Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a92b8445f99d218.xml (deflated 88%) 2025-12-04T15:43:32.3533051Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26509c918f81ae6a.xml (deflated 88%) 2025-12-04T15:43:32.3535153Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8eaa65e74c65e89.xml (deflated 88%) 2025-12-04T15:43:32.3537220Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c4eee3b8f42472a.xml (deflated 88%) 2025-12-04T15:43:32.3539370Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ced5d6d69db3ee8a.xml (deflated 88%) 2025-12-04T15:43:32.3541476Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cb6146b17613edb9.xml (deflated 88%) 2025-12-04T15:43:32.3543553Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0196d3c8ddbd25a4.xml (deflated 88%) 2025-12-04T15:43:32.3545580Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0473e73ea254c24.xml (deflated 88%) 2025-12-04T15:43:32.3547640Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-30d68f81e81e2edb.xml (deflated 88%) 2025-12-04T15:43:32.3549748Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b49d87ae2a7d48d2.xml (deflated 88%) 2025-12-04T15:43:32.3551446Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-23a9291952ff7830.xml (deflated 88%) 2025-12-04T15:43:32.3553502Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3f835bf66d4dea37.xml (deflated 88%) 2025-12-04T15:43:32.3555168Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c97380397ea111b5.xml (deflated 88%) 2025-12-04T15:43:32.3558865Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bc82533c9ec72f9.xml (deflated 97%) 2025-12-04T15:43:32.3559876Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-fcf8b9b0a2e7a178.xml (deflated 90%) 2025-12-04T15:43:32.3577256Z adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-cc2491bbd877af9c.xml (deflated 94%) 2025-12-04T15:43:32.3581356Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-66246eed1b64fd5c.xml (deflated 87%) 2025-12-04T15:43:32.3663905Z adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-38411ac3079c7061.xml (deflated 95%) 2025-12-04T15:43:32.3665519Z adding: test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-ca7327bb8f17c961.xml (deflated 81%) 2025-12-04T15:43:32.3668898Z adding: test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-b7f63c3b423acf1d.xml (deflated 89%) 2025-12-04T15:43:32.3670466Z adding: test/test-reports/python-pytest/dynamo.test_callback/dynamo.test_callback-6c0ee54264bcedf0.xml (deflated 81%) 2025-12-04T15:43:32.3672900Z adding: test/test-reports/python-pytest/inductor.test_custom_op_autotune/inductor.test_custom_op_autotune-8f7d8d00cc13374f.xml (deflated 79%) 2025-12-04T15:43:32.3677673Z adding: test/test-reports/python-pytest/test_cuda/test_cuda-d53d07fa35c7705a.xml (deflated 86%) 2025-12-04T15:43:32.3702005Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-cd011ec994e887c5.xml (deflated 95%) 2025-12-04T15:43:32.3703452Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-c3d270c5da335531.xml (deflated 90%) 2025-12-04T15:43:32.3705166Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-29a045cc5a13f6ba.xml (deflated 90%) 2025-12-04T15:43:32.3706420Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-581127d49949d608.xml (deflated 90%) 2025-12-04T15:43:32.3708009Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-09ad374497e1f0ca.xml (deflated 90%) 2025-12-04T15:43:32.3711533Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-b3812cd61ae4a2a7.xml (deflated 90%) 2025-12-04T15:43:32.3731105Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-ced76541ffb8f834.xml (deflated 96%) 2025-12-04T15:43:32.3737527Z adding: test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-95ccd07868721469.xml (deflated 93%) 2025-12-04T15:43:32.3750494Z adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-1e96fc6cc9093b07.xml (deflated 95%) 2025-12-04T15:43:32.3764159Z adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-91f289dc18834c3e.xml (deflated 95%) 2025-12-04T15:43:32.3800025Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-05b5b699aba88456.xml (deflated 93%) 2025-12-04T15:43:32.3801033Z adding: test/test-reports/python-pytest/dynamo.test_after_aot/dynamo.test_after_aot-138e4478191117d7.xml (deflated 52%) 2025-12-04T15:43:32.3803928Z adding: test/test-reports/python-pytest/inductor.test_snode_runtime/inductor.test_snode_runtime-f1ec066e866be26d.xml (deflated 92%) 2025-12-04T15:43:32.3841657Z adding: test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-bf57fb8d20e32a72.xml (deflated 92%) 2025-12-04T15:43:32.3864216Z adding: test/test-reports/python-pytest/test_testing/test_testing-4c4caba52af0adff.xml (deflated 96%) 2025-12-04T15:43:32.3865377Z adding: test/test-reports/python-pytest/inductor.test_autoheuristic/inductor.test_autoheuristic-10f7d7896ce04bc8.xml (deflated 28%) 2025-12-04T15:43:32.3866560Z adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-c4d4e9aba2280ad9.xml (deflated 88%) 2025-12-04T15:43:32.3867936Z adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-8a04be886b6d69cf.xml (deflated 79%) 2025-12-04T15:43:32.3869194Z adding: test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-c7e05865cddca77f.xml (deflated 59%) 2025-12-04T15:43:32.3870455Z adding: test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6d20a7277844030b.xml (deflated 64%) 2025-12-04T15:43:32.3871973Z adding: test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-6a2d2929a87aa7f5.xml (deflated 80%) 2025-12-04T15:43:32.3873241Z adding: test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-b498ae4cc20525c9.xml (deflated 63%) 2025-12-04T15:43:32.3874585Z adding: test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-4c5fe50d62df582d.xml (deflated 52%) 2025-12-04T15:43:32.3875726Z adding: test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-179ecdae5d21ef0e.xml (deflated 61%) 2025-12-04T15:43:32.3876858Z adding: test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-bacbff1a865ff8bb.xml (deflated 48%) 2025-12-04T15:43:32.3878068Z adding: test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-e71c26709471ff2e.xml (deflated 50%) 2025-12-04T15:43:32.3879286Z adding: test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-45ff8ae422230f99.xml (deflated 85%) 2025-12-04T15:43:32.3880549Z adding: test/test-reports/python-pytest/inductor.test_provenance_tracing/inductor.test_provenance_tracing-6455ccf06df051be.xml (deflated 85%) 2025-12-04T15:43:32.3881780Z adding: test/test-reports/python-pytest/inductor.test_memory_planning/inductor.test_memory_planning-d9b25b367275156e.xml (deflated 67%) 2025-12-04T15:43:32.3920614Z adding: test/test-reports/python-pytest/export.test_cpp_serdes/export.test_cpp_serdes-72e11f38870e0d13.xml (deflated 96%) 2025-12-04T15:43:32.3938155Z adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5ad0fee917746162.xml (deflated 97%) 2025-12-04T15:43:32.3940095Z adding: test/test-reports/python-pytest/test_sort_and_select/test_sort_and_select-049427debff60b53.xml (deflated 91%) 2025-12-04T15:43:32.3941243Z adding: test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-cccd30d217a8d074.xml (deflated 77%) 2025-12-04T15:43:32.3943471Z adding: test/test-reports/python-pytest/test_package/test_package-a2f65f799bf50b4a.xml (deflated 87%) 2025-12-04T15:43:32.3944421Z adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-c19a0c4320bf6e65.xml (deflated 50%) 2025-12-04T15:43:32.3945596Z adding: test/test-reports/python-pytest/test_utils_config_module/test_utils_config_module-cd73bdff208ab311.xml (deflated 82%) 2025-12-04T15:43:32.3946582Z adding: test/test-reports/python-pytest/test_hop_infra/test_hop_infra-d1efcb546b726ee3.xml (deflated 57%) 2025-12-04T15:43:32.3947643Z adding: test/test-reports/python-pytest/test_appending_byte_serializer/test_appending_byte_serializer-db1af3fc87bd6240.xml (deflated 61%) 2025-12-04T15:43:32.3948943Z adding: test/test-reports/python-pytest/test_ao_sparsity/test_ao_sparsity-47b60e8cb29a5ef6.xml (deflated 85%) 2025-12-04T15:43:32.3949917Z adding: test/test-reports/python-pytest/test_extension_utils/test_extension_utils-5e3baa267a09a3bb.xml (deflated 52%) 2025-12-04T15:43:32.3951204Z adding: test/test-reports/python-pytest/nn.attention.test_fa4/nn.attention.test_fa4-2d55ad78ccee943a.xml (deflated 97%) 2025-12-04T15:43:32.3955414Z adding: test/test-reports/python-pytest/typing.test_python_operators/typing.test_python_operators-7b01e9f4c56696ce.xml (deflated 96%) 2025-12-04T15:43:32.3956821Z adding: test/test-reports/python-pytest/torch_np.test_dtype/torch_np.test_dtype-50c590a3e827391c.xml (deflated 94%) 2025-12-04T15:43:32.3958026Z adding: test/test-reports/python-pytest/test_file_check/test_file_check-c5f916d4f839abe2.xml (deflated 47%) 2025-12-04T15:43:32.3959055Z adding: test/test-reports/python-pytest/profiler.test_kineto/profiler.test_kineto-1437f02ea71dbd19.xml (deflated 37%) 2025-12-04T15:43:32.3960147Z adding: test/test-reports/python-pytest/functorch.test_ac_knapsack/functorch.test_ac_knapsack-a2f3dae1f99bc885.xml (deflated 78%) 2025-12-04T15:43:32.3988896Z adding: test/test-reports/python-pytest/torch_np.test_nep50_examples/torch_np.test_nep50_examples-87e42828c2fde829.xml (deflated 99%) 2025-12-04T15:43:32.4004859Z adding: test/test-reports/python-pytest/test_torch/test_torch-6322eeaa434bd119.xml (deflated 92%) 2025-12-04T15:43:32.4006009Z adding: test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-6cf9ed264c8fa189.xml (deflated 28%) 2025-12-04T15:43:32.4145155Z adding: test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-510898c7a9dfb9c9.xml (deflated 97%) 2025-12-04T15:43:32.4158779Z adding: test/test-reports/python-pytest/test_modules/test_modules-1ceed37f0450876d.xml (deflated 94%) 2025-12-04T15:43:32.4163480Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-320a7bc7a2da135c.xml (deflated 94%) 2025-12-04T15:43:32.4166165Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-9c6a851d43187f63.xml (deflated 95%) 2025-12-04T15:43:32.4167606Z adding: test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-612fe6974f2e86fb.xml (deflated 35%) 2025-12-04T15:43:32.4168759Z adding: test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-573eaa6de6818c33.xml (deflated 89%) 2025-12-04T15:43:32.4169921Z adding: test/test-reports/python-pytest/test_shape_ops/test_shape_ops-8ae5e584fb53bb5e.xml (deflated 92%) 2025-12-04T15:43:32.4171299Z adding: test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-419c9aea1e4e06f2.xml (deflated 79%) 2025-12-04T15:43:32.4173051Z adding: test/test-reports/python-pytest/test_indexing/test_indexing-bb3db4f55bab2e87.xml (deflated 90%) 2025-12-04T15:43:32.4174155Z adding: test/test-reports/python-pytest/test_type_info/test_type_info-3cbecfd6afe8711f.xml (deflated 68%) 2025-12-04T15:43:32.4190641Z adding: test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-3265775c77799c99.xml (deflated 93%) 2025-12-04T15:43:32.4192195Z adding: test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5e8dbe55d5e60a97.xml (deflated 91%) 2025-12-04T15:43:32.4193614Z adding: test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-339f2b8a0ba2c562.xml (deflated 91%) 2025-12-04T15:43:32.4195255Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-7a9eb44e36e96ef2.xml (deflated 90%) 2025-12-04T15:43:32.4196964Z adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-8a1338a601c4ef0b.xml (deflated 86%) 2025-12-04T15:43:32.4198478Z adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-d08ca7b1f6355251.xml (deflated 81%) 2025-12-04T15:43:32.4199883Z adding: test/test-reports/python-pytest/nn.test_init/nn.test_init-bb3f84e769cc626f.xml (deflated 83%) 2025-12-04T15:43:32.4201130Z adding: test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-081f0752aeda15ae.xml (deflated 80%) 2025-12-04T15:43:32.4205619Z adding: test/test-reports/python-pytest/test_type_promotion/test_type_promotion-3f39f26aca555a70.xml (deflated 96%) 2025-12-04T15:43:32.4259787Z adding: test/test-reports/python-pytest/test_reductions/test_reductions-31a848701d5079bd.xml (deflated 96%) 2025-12-04T15:43:32.4260823Z adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204154318.xml (deflated 43%) 2025-12-04T15:43:32.4290378Z ##[group]Run # Remove any previous usage logs if they exist 2025-12-04T15:43:32.4290833Z # Remove any previous usage logs if they exist 2025-12-04T15:43:32.4291204Z rm -f logs-*.zip 2025-12-04T15:43:32.4291548Z zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true 2025-12-04T15:43:32.4292048Z zip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true 2025-12-04T15:43:32.4301440Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:32.4301818Z env: 2025-12-04T15:43:32.4302022Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:32.4302284Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:32.4302597Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:32.4303162Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:32.4303894Z FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212 2025-12-04T15:43:32.4304340Z ##[endgroup] 2025-12-04T15:43:32.4387775Z adding: usage_log.txt (deflated 58%) 2025-12-04T15:43:32.4444987Z adding: test/test-reports/inductor.test_aot_inductor_2.5_ac1d7e2a37fbed81_.log (deflated 90%) 2025-12-04T15:43:32.4460657Z adding: test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_1.4_295ecc74e041d7f8_.log (deflated 92%) 2025-12-04T15:43:32.4470539Z adding: test/test-reports/inductor.test_torchinductor_opinfo_4.14_2b71ae42f7581618_.log (deflated 92%) 2025-12-04T15:43:32.4478104Z adding: test/test-reports/inductor.test_torchinductor_opinfo_12.14_f1debdb3c47cb0ae_.log (deflated 91%) 2025-12-04T15:43:32.4483802Z adding: test/test-reports/inductor.test_flex_attention_6.6_cafbaa2a62098057_.log (deflated 89%) 2025-12-04T15:43:32.4843356Z adding: test/test-reports/inductor.test_fp8_1.1_440b1865b73f9802_.log (deflated 95%) 2025-12-04T15:43:32.4844682Z adding: test/test-reports/dynamo.test_model_output_1.1_2df9271f2ebae91b_.log (deflated 79%) 2025-12-04T15:43:32.4856013Z adding: test/test-reports/inductor.test_triton_kernels_1.1_4c43492168172809_.log (deflated 92%) 2025-12-04T15:43:32.4860603Z adding: test/test-reports/inductor.test_loop_ordering_1.1_cda1b68c4235c80b_.log (deflated 89%) 2025-12-04T15:43:32.4903558Z adding: test/test-reports/export.test_serdes_1.1_c37c9c83d5d3a964_.log (deflated 91%) 2025-12-04T15:43:32.4904773Z adding: test/test-reports/inductor.test_scatter_optimization_1.1_38363d3a7ae9f86e_.log (deflated 79%) 2025-12-04T15:43:32.4906883Z adding: test/test-reports/inductor.test_padding_1.1_3b58a6813a3709bc_.log (deflated 86%) 2025-12-04T15:43:32.4907599Z adding: test/test-reports/dynamo.test_callback_1.1_4647abf0637b193b_.log (deflated 61%) 2025-12-04T15:43:32.4908597Z adding: test/test-reports/inductor.test_custom_op_autotune_1.1_2272505dccfac9af_.log (deflated 62%) 2025-12-04T15:43:32.4917927Z adding: test/test-reports/test_cuda_1.1_5ed6ed395e86485d_.log (deflated 85%) 2025-12-04T15:43:32.4994042Z adding: test/test-reports/test_sparse_1.1_e217f60a40d48402_.log (deflated 95%) 2025-12-04T15:43:32.5001866Z adding: test/test-reports/test_ops_fwd_gradients_6.12_abead446b517b77f_.log (deflated 91%) 2025-12-04T15:43:32.5017348Z adding: test/test-reports/test_ops_gradients_2.10_8b90327e47e16b38_.log (deflated 92%) 2025-12-04T15:43:32.5034000Z adding: test/test-reports/test_ops_gradients_10.10_690d4f6748dd1bf7_.log (deflated 92%) 2025-12-04T15:43:32.5075761Z adding: test/test-reports/functorch.test_ops_3.6_4e22832cb04fe87a_.log (deflated 92%) 2025-12-04T15:43:32.5076461Z adding: test/test-reports/dynamo.test_after_aot_1.1_e8843ead62c525f1_.log (deflated 54%) 2025-12-04T15:43:32.5077343Z adding: test/test-reports/inductor.test_snode_runtime_1.1_f8102af9af532885_.log (deflated 79%) 2025-12-04T15:43:32.5094620Z adding: test/test-reports/inductor.test_compiled_autograd_1.2_d8737cb5eeb8c364_.log (deflated 90%) 2025-12-04T15:43:32.5139812Z adding: test/test-reports/test_testing_1.1_6250d60ab394f89f_.log (deflated 94%) 2025-12-04T15:43:32.5140524Z adding: test/test-reports/inductor.test_autoheuristic_1.1_6939193d627efb00_.log (deflated 50%) 2025-12-04T15:43:32.5141302Z adding: test/test-reports/inductor.test_cutedsl_template_1.1_c65b62856ae46e85_.log (deflated 77%) 2025-12-04T15:43:32.5142113Z adding: test/test-reports/inductor.test_benchmark_fusion_1.1_f16e3698532d27f8_.log (deflated 76%) 2025-12-04T15:43:32.5142876Z adding: test/test-reports/inductor.test_remote_cache_1.1_e90358269eb2823f_.log (deflated 60%) 2025-12-04T15:43:32.5143821Z adding: test/test-reports/inductor.test_coordinate_descent_tuner_1.1_2fd6afd7cb5bda25_.log (deflated 68%) 2025-12-04T15:43:32.5144652Z adding: test/test-reports/inductor.test_inplace_padding_1.1_25c4b19bcfb0badf_.log (deflated 69%) 2025-12-04T15:43:32.5145427Z adding: test/test-reports/inductor.test_cudacodecache_1.1_20e9a908d42a6261_.log (deflated 56%) 2025-12-04T15:43:32.5146266Z adding: test/test-reports/inductor.test_minifier_utils_1.1_82d82b53a102b66f_.log (deflated 60%) 2025-12-04T15:43:32.5147020Z adding: test/test-reports/inductor.test_debug_trace_1.1_cc4f32af9453e690_.log (deflated 62%) 2025-12-04T15:43:32.5147744Z adding: test/test-reports/export.test_tree_utils_1.1_0e627f819fabbb55_.log (deflated 55%) 2025-12-04T15:43:32.5148563Z adding: test/test-reports/inductor.test_triton_wrapper_1.1_25aa967110a2fbe1_.log (deflated 53%) 2025-12-04T15:43:32.5149348Z adding: test/test-reports/inductor.test_static_cuda_launcher_1.1_0c71a221d8835012_.log (deflated 79%) 2025-12-04T15:43:32.5150185Z adding: test/test-reports/inductor.test_provenance_tracing_1.1_80110daa3530439c_.log (deflated 80%) 2025-12-04T15:43:32.5151265Z adding: test/test-reports/inductor.test_memory_planning_1.1_fa1d6b036138d22f_.log (deflated 59%) 2025-12-04T15:43:32.5166506Z adding: test/test-reports/export.test_cpp_serdes_1.1_75563679f31ba4f4_.log (deflated 89%) 2025-12-04T15:43:32.5630720Z adding: test/test-reports/inductor.test_control_flow_2.4_3b4432ec9408add0_.log (deflated 96%) 2025-12-04T15:43:32.5633467Z adding: test/test-reports/test_sort_and_select_1.1_bec7fa88f7702fb0_.log (deflated 89%) 2025-12-04T15:43:32.5634198Z adding: test/test-reports/functorch.test_rearrange_1.1_a7b15b1a80eb0b56_.log (deflated 71%) 2025-12-04T15:43:32.5638631Z adding: test/test-reports/test_package_1.1_f2ef9e9917fb97f5_.log (deflated 87%) 2025-12-04T15:43:32.5639277Z adding: test/test-reports/test_mkl_verbose_1.1_a8ab8be9a564b785_.log (deflated 54%) 2025-12-04T15:43:32.5639976Z adding: test/test-reports/test_utils_config_module_1.1_aa22a3cb4155f80d_.log (deflated 80%) 2025-12-04T15:43:32.5640938Z adding: test/test-reports/test_hop_infra_1.1_f77bb32afa422f2e_.log (deflated 57%) 2025-12-04T15:43:32.5641682Z adding: test/test-reports/test_appending_byte_serializer_1.1_7e52ee648e02aa85_.log (deflated 62%) 2025-12-04T15:43:32.5644401Z adding: test/test-reports/test_ao_sparsity_1.1_c127cba34d71d100_.log (deflated 87%) 2025-12-04T15:43:32.5645084Z adding: test/test-reports/test_extension_utils_1.1_7f66e708b7c7a8bc_.log (deflated 57%) 2025-12-04T15:43:32.5647509Z adding: test/test-reports/nn.attention.test_fa4_1.1_59632c9893caec1b_.log (deflated 94%) 2025-12-04T15:43:32.5655143Z adding: test/test-reports/typing.test_python_operators_1.1_1dbf7db937cf8b4b_.log (deflated 93%) 2025-12-04T15:43:32.5656238Z adding: test/test-reports/torch_np.test_dtype_1.1_8ba7a24ba508317e_.log (deflated 88%) 2025-12-04T15:43:32.5656896Z adding: test/test-reports/test_file_check_1.1_e6044214ffdb04bb_.log (deflated 53%) 2025-12-04T15:43:32.5657721Z adding: test/test-reports/profiler.test_kineto_1.1_3901a608b259f0c8_.log (deflated 51%) 2025-12-04T15:43:32.5658780Z adding: test/test-reports/functorch.test_ac_knapsack_1.1_a4a52ea27bf21bce_.log (deflated 78%) 2025-12-04T15:43:32.5687811Z adding: test/test-reports/torch_np.test_nep50_examples_1.1_be93e5fc5572125c_.log (deflated 96%) 2025-12-04T15:43:32.5711258Z adding: test/test-reports/test_torch_1.1_ed3627b67cdc077e_.log (deflated 91%) 2025-12-04T15:43:32.5711923Z adding: test/test-reports/xpu.test_gemm_1.1_db81f0dcd896f79f_.log (deflated 48%) 2025-12-04T15:43:32.5981773Z adding: test/test-reports/test_binary_ufuncs_1.1_d43f59e69a692663_.log (deflated 96%) 2025-12-04T15:43:32.6002483Z adding: test/test-reports/test_modules_2.4_d8a3e6157b79afbb_.log (deflated 93%) 2025-12-04T15:43:32.6009442Z adding: test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_3f3446ecd43fd597_.log (deflated 92%) 2025-12-04T15:43:32.6012319Z adding: test/test-reports/torch_np.numpy_tests.core.test_dtype_1.1_bb9947961cd52757_.log (deflated 91%) 2025-12-04T15:43:32.6013190Z adding: test/test-reports/lazy.test_debug_util_1.1_6159721dd42cd649_.log (deflated 51%) 2025-12-04T15:43:32.6013897Z adding: test/test-reports/nn.test_load_state_dict_1.1_1f7336ad32e96ae1_.log (deflated 85%) 2025-12-04T15:43:32.6016299Z adding: test/test-reports/test_shape_ops_1.1_17556160abffc005_.log (deflated 87%) 2025-12-04T15:43:32.6017578Z adding: test/test-reports/profiler.test_memory_profiler_1.1_f20e3ab107ff598c_.log (deflated 82%) 2025-12-04T15:43:32.6022296Z adding: test/test-reports/test_indexing_1.1_fbbd66d5cf2cd3ea_.log (deflated 90%) 2025-12-04T15:43:32.6022978Z adding: test/test-reports/test_type_info_1.1_02020d4e7679db8b_.log (deflated 61%) 2025-12-04T15:43:32.6039927Z adding: test/test-reports/functorch.test_aotdispatch_1.1_73fa05bc552fde2d_.log (deflated 91%) 2025-12-04T15:43:32.6041731Z adding: test/test-reports/test_scatter_gather_ops_1.1_e624bed173f96ebf_.log (deflated 89%) 2025-12-04T15:43:32.6057216Z adding: test/test-reports/test_cuda_multigpu_1.1_134114cd1fad822a_.log (deflated 85%) 2025-12-04T15:43:32.6058284Z adding: test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_a7d224f05328be14_.log (deflated 85%) 2025-12-04T15:43:32.6059252Z adding: test/test-reports/test_jit_autocast_1.1_449f99b0d0d7aa89_.log (deflated 81%) 2025-12-04T15:43:32.6059966Z adding: test/test-reports/test_xnnpack_integration_1.1_ef1a45d9c52ae3ce_.log (deflated 72%) 2025-12-04T15:43:32.6060903Z adding: test/test-reports/nn.test_init_1.1_414026fa8e0e69bb_.log (deflated 78%) 2025-12-04T15:43:32.6061565Z adding: test/test-reports/test_mobile_optimizer_1.1_2406b12c26273884_.log (deflated 67%) 2025-12-04T15:43:32.6062243Z adding: test/test-reports/test_type_promotion_1.1_a64bbb5536dae6ab_.log (deflated 94%) 2025-12-04T15:43:32.6151358Z adding: test/test-reports/test_reductions_1.1_4c27d813839f98a0_.log (deflated 96%) 2025-12-04T15:43:32.6180524Z ##[group]Run # Remove any previous debugging artifacts if they exist 2025-12-04T15:43:32.6181071Z # Remove any previous debugging artifacts if they exist 2025-12-04T15:43:32.6181482Z rm -f debug-*.zip 2025-12-04T15:43:32.6181761Z if [ -d 'test/debug' ]; then 2025-12-04T15:43:32.6182119Z  zip -r "debug-${FILE_SUFFIX}.zip" test/debug 2025-12-04T15:43:32.6182455Z fi 2025-12-04T15:43:32.6191394Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:32.6191771Z env: 2025-12-04T15:43:32.6191993Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:32.6192256Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:32.6192574Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:32.6193128Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:32.6193778Z FILE_SUFFIX: test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212 2025-12-04T15:43:32.6194221Z ##[endgroup] 2025-12-04T15:43:32.6287235Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T15:43:32.6287560Z with: 2025-12-04T15:43:32.6287773Z s3-bucket: gha-artifacts 2025-12-04T15:43:32.6288094Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T15:43:32.6288484Z retention-days: 14 2025-12-04T15:43:32.6288722Z if-no-files-found: warn 2025-12-04T15:43:32.6288990Z path: test-jsons-*.zip 2025-12-04T15:43:32.6289246Z name: artifact 2025-12-04T15:43:32.6289455Z region: us-east-1 2025-12-04T15:43:32.6289671Z env: 2025-12-04T15:43:32.6289874Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:32.6290134Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:32.6290448Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:32.6291005Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:32.6291493Z ##[endgroup] 2025-12-04T15:43:32.9746884Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T15:43:32.9747344Z With the provided path, there will be 1 file uploaded 2025-12-04T15:43:32.9747998Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T15:43:32.9820820Z Starting upload of test-jsons-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip 2025-12-04T15:43:33.1822410Z Finished upload of test-jsons-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip 2025-12-04T15:43:33.2120624Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T15:43:33.2120942Z with: 2025-12-04T15:43:33.2121155Z s3-bucket: gha-artifacts 2025-12-04T15:43:33.2121478Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T15:43:33.2121824Z retention-days: 14 2025-12-04T15:43:33.2122072Z if-no-files-found: error 2025-12-04T15:43:33.2122334Z path: test-reports-*.zip 2025-12-04T15:43:33.2122580Z name: artifact 2025-12-04T15:43:33.2122794Z region: us-east-1 2025-12-04T15:43:33.2123003Z env: 2025-12-04T15:43:33.2123202Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:33.2123454Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:33.2123767Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:33.2124324Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:33.2124824Z ##[endgroup] 2025-12-04T15:43:33.5754507Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T15:43:33.5754939Z With the provided path, there will be 1 file uploaded 2025-12-04T15:43:33.5755392Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T15:43:33.5828900Z Starting upload of test-reports-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip 2025-12-04T15:43:33.7656446Z Finished upload of test-reports-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip 2025-12-04T15:43:33.7966297Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T15:43:33.7966618Z with: 2025-12-04T15:43:33.7966819Z s3-bucket: gha-artifacts 2025-12-04T15:43:33.7967125Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T15:43:33.7967470Z retention-days: 14 2025-12-04T15:43:33.7967709Z if-no-files-found: ignore 2025-12-04T15:43:33.7967980Z path: logs-*.zip 2025-12-04T15:43:33.7968203Z name: artifact 2025-12-04T15:43:33.7968421Z region: us-east-1 2025-12-04T15:43:33.7968665Z env: 2025-12-04T15:43:33.7968890Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:33.7969138Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:33.7969444Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:33.7970008Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:33.7970506Z ##[endgroup] 2025-12-04T15:43:34.1281653Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T15:43:34.1282187Z With the provided path, there will be 1 file uploaded 2025-12-04T15:43:34.1282632Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T15:43:34.1355773Z Starting upload of logs-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip 2025-12-04T15:43:34.3096002Z Finished upload of logs-test-default-2-8-linux.g5.4xlarge.nvidia.gpu_57118183212.zip 2025-12-04T15:43:34.3394752Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T15:43:34.3395086Z with: 2025-12-04T15:43:34.3395285Z s3-bucket: gha-artifacts 2025-12-04T15:43:34.3395593Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T15:43:34.3395934Z retention-days: 14 2025-12-04T15:43:34.3396175Z if-no-files-found: ignore 2025-12-04T15:43:34.3396447Z path: debug-*.zip 2025-12-04T15:43:34.3396666Z name: artifact 2025-12-04T15:43:34.3396875Z region: us-east-1 2025-12-04T15:43:34.3397089Z env: 2025-12-04T15:43:34.3397298Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:34.3397544Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:34.3397860Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:34.3398419Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:34.3398959Z ##[endgroup] 2025-12-04T15:43:34.6657180Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded. 2025-12-04T15:43:34.6952115Z ##[group]Run # shellcheck disable=SC2156 2025-12-04T15:43:34.6952490Z # shellcheck disable=SC2156 2025-12-04T15:43:34.6953075Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-12-04T15:43:34.6962475Z shell: /usr/bin/bash -e {0} 2025-12-04T15:43:34.6962847Z env: 2025-12-04T15:43:34.6963047Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:34.6963308Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:34.6963608Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:34.6964174Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:34.6964678Z ##[endgroup] 2025-12-04T15:43:35.1051562Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a 2025-12-04T15:43:35.1052051Z with: 2025-12-04T15:43:35.1052344Z name: coredumps-default-2-8-linux.g5.4xlarge.nvidia.gpu 2025-12-04T15:43:35.1052725Z retention-days: 14 2025-12-04T15:43:35.1052983Z if-no-files-found: ignore 2025-12-04T15:43:35.1053238Z path: ./**/core.[1-9]* 2025-12-04T15:43:35.1053489Z s3-bucket: gha-artifacts 2025-12-04T15:43:35.1053743Z region: us-east-1 2025-12-04T15:43:35.1053943Z env: 2025-12-04T15:43:35.1054140Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:35.1054392Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:35.1054696Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:35.1055245Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:35.1055739Z ##[endgroup] 2025-12-04T15:43:49.6865296Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-12-04T15:43:49.7315272Z Prepare all required actions 2025-12-04T15:43:49.7315630Z Getting action download info 2025-12-04T15:43:49.8903834Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548) 2025-12-04T15:43:50.2979921Z ##[group]Run ./.github/actions/upload-utilization-stats 2025-12-04T15:43:50.2980290Z with: 2025-12-04T15:43:50.2980486Z job_id: 57118183212 2025-12-04T15:43:50.2981165Z job_name: linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 2, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck, mem_leak_check) 2025-12-04T15:43:50.2981911Z workflow_name: periodic 2025-12-04T15:43:50.2982175Z workflow_run_id: 19922826259 2025-12-04T15:43:50.2982441Z workflow_attempt: 1 2025-12-04T15:43:50.2982662Z env: 2025-12-04T15:43:50.2982861Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:50.2983109Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:50.2983414Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:50.2983997Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:50.2984493Z ##[endgroup] 2025-12-04T15:43:50.3040175Z ##[group]Run actions/setup-python@v6 2025-12-04T15:43:50.3040451Z with: 2025-12-04T15:43:50.3040646Z python-version: 3.10 2025-12-04T15:43:50.3040883Z check-latest: false 2025-12-04T15:43:50.3041221Z token: *** 2025-12-04T15:43:50.3041436Z update-environment: true 2025-12-04T15:43:50.3041698Z allow-prereleases: false 2025-12-04T15:43:50.3041953Z freethreaded: false 2025-12-04T15:43:50.3042182Z env: 2025-12-04T15:43:50.3042375Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:50.3042610Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:50.3042903Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:50.3043453Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:50.3043942Z ##[endgroup] 2025-12-04T15:43:50.6599406Z ##[group]Installed versions 2025-12-04T15:43:50.6608558Z Version 3.10 was not found in the local cache 2025-12-04T15:43:50.6824310Z (node:242511) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2025-12-04T15:43:50.6825096Z (Use `node --trace-deprecation ...` to show where the warning was created) 2025-12-04T15:43:51.1411054Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system. The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json 2025-12-04T15:43:51.1625618Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-12-04T15:43:51.1626166Z with: 2025-12-04T15:43:51.1626365Z env: 2025-12-04T15:43:51.1626571Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:51.1626834Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:51.1627146Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:51.1627706Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:51.1628217Z ##[endgroup] 2025-12-04T15:43:51.1644421Z ##[group]Run set -eou pipefail 2025-12-04T15:43:51.1644735Z set -eou pipefail 2025-12-04T15:43:51.1657160Z  2025-12-04T15:43:51.1657535Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2025-12-04T15:43:51.1658000Z for _ in $(seq 1440); do 2025-12-04T15:43:51.1658331Z  # Break if no ssh session exists anymore 2025-12-04T15:43:51.1658674Z  if [ "$(who)" = "" ]; then 2025-12-04T15:43:51.1658995Z  break 2025-12-04T15:43:51.1659287Z  fi 2025-12-04T15:43:51.1659499Z  echo "." 2025-12-04T15:43:51.1659738Z  sleep 5 2025-12-04T15:43:51.1659967Z done 2025-12-04T15:43:51.1669007Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:51.1669373Z env: 2025-12-04T15:43:51.1669579Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:51.1669830Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:51.1670138Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:51.1670708Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:51.1671202Z ##[endgroup] 2025-12-04T15:43:51.1702667Z Holding runner for 2 hours until all ssh sessions have logged out 2025-12-04T15:43:51.1796705Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T15:43:51.1797504Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T15:43:51.1798085Z # shellcheck disable=SC2046 2025-12-04T15:43:51.1798461Z docker stop $(docker ps -q) || true 2025-12-04T15:43:51.1798809Z # Prune all of the docker images 2025-12-04T15:43:51.1799121Z docker system prune -af 2025-12-04T15:43:51.1808767Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:43:51.1809141Z env: 2025-12-04T15:43:51.1809354Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:43:51.1809603Z HAS_NVIDIA_GPU: true 2025-12-04T15:43:51.1809907Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:43:51.1810459Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:43:51.1810953Z ##[endgroup] 2025-12-04T15:44:02.6322008Z 5d0babf71ea3 2025-12-04T15:44:07.5055002Z Deleted Containers: 2025-12-04T15:44:07.5055493Z 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:44:07.5055831Z 2025-12-04T15:44:20.2146344Z Deleted Images: 2025-12-04T15:44:20.2147529Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T15:44:20.2149090Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97 2025-12-04T15:44:20.2150025Z deleted: sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301 2025-12-04T15:44:20.2150675Z deleted: sha256:85a76b7bf29ad34eb76cce6f46af5d49a58b6272f80f983d5c769e82c7749301 2025-12-04T15:44:20.2151333Z deleted: sha256:0882f3ce59ff5ae30195ee4b059fc713e13eda107a3a7814a4616ac9058a30a4 2025-12-04T15:44:20.2151970Z deleted: sha256:64ba5b9344c11a3e4729136076830b90ac4cf1554046edb1bd4f0784b66ebd9b 2025-12-04T15:44:20.2152861Z deleted: sha256:88213c59cf461a65ab9b6cb07b4195dc9d41b5241c152daa002c7b3112e09124 2025-12-04T15:44:20.2153505Z deleted: sha256:4c0f83afa802ffbc05ebaf1aa50e48a2447c7c295549a6dded80ac63437906ca 2025-12-04T15:44:20.2154394Z deleted: sha256:6f7ec74460e8fb070c8209949095ea3be5f4e2fd69c9f750cd39ac4093f5e64b 2025-12-04T15:44:20.2155044Z deleted: sha256:d6928b0d1021b31942fdcb64e5eb4a34682de66e959dd424ed6ed02c29cd706d 2025-12-04T15:44:20.2155884Z deleted: sha256:4e9fbcb1705a6351bb34dd320558752614308636b94fd9ae6f26063e3deadc0a 2025-12-04T15:44:20.2156520Z deleted: sha256:43aabd0201f48712f21758071352dea029b4de37be08b2e2197706856a9ecbf2 2025-12-04T15:44:20.2157258Z deleted: sha256:940a98dec78303f0548beb1033242a45e9097607ef3e55c8b949b69b73d1b95e 2025-12-04T15:44:20.2157953Z deleted: sha256:d2849fa0e0411cf66e4408831d70e38838afb55b11a80c1c4d8aa0ae7dc9ca40 2025-12-04T15:44:20.2158577Z deleted: sha256:14f40d23c20c7e562623f89deb376520296758bc39dd3c77284049b84ebd8a31 2025-12-04T15:44:20.2159223Z deleted: sha256:a8ccba61f90ca097cb391d0f4fbed0d9f821d06b00e28f7332e9e2dcfcbac4ca 2025-12-04T15:44:20.2159875Z deleted: sha256:91b2060d290547d3b517d4a11d994bbe23f4560b5546cb91918ca1828dde6be1 2025-12-04T15:44:20.2160505Z deleted: sha256:b42a184755715dcfead7fad655a127433541d316d9628f5f730ff17ad5f8071c 2025-12-04T15:44:20.2161154Z deleted: sha256:aa5b4f3c9169061dc3c6da0e677e8a86f11ecb0a3f9fb4861ab3d8c04379775c 2025-12-04T15:44:20.2161811Z deleted: sha256:b4dcf450081a48d77fea0a21b8d810a69c03608a595e754fe7d365058d0579b7 2025-12-04T15:44:20.2162460Z deleted: sha256:4f7fe12d3d4f5bf890c7ada4ce16f17a105472aa6509a778f917dcce2f28174b 2025-12-04T15:44:20.2163107Z deleted: sha256:2d1d5a74182594f9a8553df00fdcfc809dba407bcd6700d667f862cbe9d555ce 2025-12-04T15:44:20.2163759Z deleted: sha256:d901e2f5d449aeed16b727bdcc11fc0e0f6c30c8fc5c39ac7eeac8a74d9d176c 2025-12-04T15:44:20.2164520Z deleted: sha256:a04df2603bd12372c6632469a9a81ebc4a8d677452c250672b9692884fa6a452 2025-12-04T15:44:20.2165162Z deleted: sha256:f438a6b52273a552dc3820a55c74c53a62a0eae9f2a7d21b37125add7d71639f 2025-12-04T15:44:20.2165801Z deleted: sha256:d4b09517e9518d709ac98b0ae6f8446ec9ac51688253607b1fca67aa2c87b3f4 2025-12-04T15:44:20.2166473Z deleted: sha256:c1fa38335237f5e7263e39d3d3de98215bcfbbb12b826955c02e149bf68efd13 2025-12-04T15:44:20.2167207Z deleted: sha256:c898d20a30de901fca74d7611663b17ab48e1726a11e031e40548ed16ee81877 2025-12-04T15:44:20.2167846Z deleted: sha256:3baceec7096518fcc10696feba551639d698b3145c2fc09cac927bb60c0fd751 2025-12-04T15:44:20.2168492Z deleted: sha256:5245aaaa3d5c3a19f76b9a6c920bd82d1a0ff5289f87c8c109652089709d9b3b 2025-12-04T15:44:20.2169127Z deleted: sha256:f05cc789b95246938c377f474c41187965b89ceac0250e7d5124bec32153f447 2025-12-04T15:44:20.2169841Z deleted: sha256:07ec4fc008de4e7a2c794ec7094cc72e0d287c04c8b2156163aee0bae147fe2d 2025-12-04T15:44:20.2170572Z deleted: sha256:c6302601ad5fde573c1f8c900250478fca7fdc6907d8fd4fae651b94b4d9264d 2025-12-04T15:44:20.2171222Z deleted: sha256:cc5e955ee1dc54931f02606c5ea87aae14f03b5d764492be611480ab041f2882 2025-12-04T15:44:20.2171866Z deleted: sha256:f21c03518996d98452338f4e80bcfd9b139a1dab155f4830be0d3f623035269f 2025-12-04T15:44:20.2172496Z deleted: sha256:519ca6f1279f7886f25f0005527cfa627deebbc5b7d7cdbfa7ef962bcfc4c26d 2025-12-04T15:44:20.2173132Z deleted: sha256:0ef990495216807d0175b192045be3f617e72331bc373b3434807f41bf69168d 2025-12-04T15:44:20.2173768Z deleted: sha256:7093edf7319e1f0e01654c3224e32c8dede5b948d106e0b9b03cbf0bb1091e33 2025-12-04T15:44:20.2174405Z deleted: sha256:c478161e058e2f4041555c3e880b95ee1ee047938dc58549a3a88135740996ae 2025-12-04T15:44:20.2175045Z deleted: sha256:9bb853b0d938cd7c36a80ce8ee40653f2c0ff92719209b11beb03acc8855ce3e 2025-12-04T15:44:20.2175699Z deleted: sha256:fdf2ace71a78ce6910ef9c4b073c195531da47022443b606bb92dcd6499b6afc 2025-12-04T15:44:20.2176506Z deleted: sha256:576c2b3770d871937d3cfb7014328bcb4bd1aed0c28bc438764b3bfdac4c1ac2 2025-12-04T15:44:20.2177433Z deleted: sha256:878e92b9cb82de09ac14a9d5f3f7bc2411a799b6f54d0d64b78c2bb4d1fdc0fc 2025-12-04T15:44:20.2178285Z deleted: sha256:85c8c3b98b65a6695f988a10cc66c981d73a3ef03eda15b8e14d227b50b56300 2025-12-04T15:44:20.2179037Z deleted: sha256:ce2ab3ba07794f9ee95d6ea7de6dcd3d2aed96561f9a79192dd56ca5bf29313a 2025-12-04T15:44:20.2179905Z deleted: sha256:37a6e12976ca957286977e696e63012ab9821214b0483fe1a48d29dcb280508a 2025-12-04T15:44:20.2180540Z deleted: sha256:cd1d5d3dd7038144ca6fe961c0d4c8e705625ae0c36190ba8b3e9602abedad19 2025-12-04T15:44:20.2181221Z deleted: sha256:0e707276e0be2e0008b86d594fadc0d16444d66c4fb7227c56f144cbb3c2affd 2025-12-04T15:44:20.2181870Z deleted: sha256:22d4aad6a2ada91b341c1225a0f314042b8aeabef7568c5c019709b058bf070b 2025-12-04T15:44:20.2182543Z deleted: sha256:ee4adacf4e0933131d0275eddad406b3c8147e6cf07a292b99f1aff4b5355f33 2025-12-04T15:44:20.2183193Z deleted: sha256:43da0b9e7c0e18403dcb834e53628dc7c970ccb2dbd091878c0d7c0170dbc97f 2025-12-04T15:44:20.2183846Z deleted: sha256:00571684bdcd75beda15eb7d4e79b5458bc914350f9bb4d87fcdc97ad15e0da1 2025-12-04T15:44:20.2184489Z deleted: sha256:41615f09950259f1d75e82ef35b6fc53b18fe71ebff143744cfd51009d04349e 2025-12-04T15:44:20.2185142Z deleted: sha256:75ab34d2eed3c7915467a506ab6dab2711918fbabe94add2fb5c62780221ab0c 2025-12-04T15:44:20.2185797Z deleted: sha256:0a39ef2bebf44c1c3893d1e5fb42dad48b8fac7ca673141267ee967f85455e89 2025-12-04T15:44:20.2186450Z deleted: sha256:9b7d024e48ba1f9824a54597621b1b062cbc4aa41a77d81ca538d6b5c24a612c 2025-12-04T15:44:20.2187109Z deleted: sha256:392257172de6434c271bd93394218a91e9aa86d7c18abc2f2759317b9d5fb6de 2025-12-04T15:44:20.2187839Z deleted: sha256:6c3232860b930866a463a356124fc392c7e5f04895695229257e8c3e8a02711d 2025-12-04T15:44:20.2188473Z deleted: sha256:63dd55b807215e2fa6c715419ac0c5072d02dddc848dbf74bb7e77b906b5eaed 2025-12-04T15:44:20.2189113Z deleted: sha256:07a8738c1b4584db72ed9aa60f5274321eb0ba16263450da3a75df8326ebc25f 2025-12-04T15:44:20.2189758Z deleted: sha256:053fe2965b01281d12040ec1893e0d1aa77362a49ea9a1067402272c69dad9f5 2025-12-04T15:44:20.2190385Z deleted: sha256:7857fb5eb181c4e80262ecab60bdd3c266cf3d1409ceb76c05882609b416a8d3 2025-12-04T15:44:20.2191033Z deleted: sha256:752528477fc99089de3bd2c6da7b30cf34f2e901fe06d8fcfe685b411461e883 2025-12-04T15:44:20.2191682Z deleted: sha256:cce0210e2f4b042601813df03aa294a86b0c668fcfc75f4c63f6fa12b2952e15 2025-12-04T15:44:20.2192326Z deleted: sha256:f2bb405a26705ecd12d21380d26d9355d01db3a2175080fbdb468f2b5a25a76c 2025-12-04T15:44:20.2192986Z deleted: sha256:ad430120d4ffbaf97cd8d6de6ea8eefa4a8f80ec45f0b176c6b26bff0970fd33 2025-12-04T15:44:20.2193645Z deleted: sha256:225a4910baea7cc540ed43eeac75046293800ab0b8e0192b51e991c8cb50bcf3 2025-12-04T15:44:20.2194300Z deleted: sha256:a259945b0c3507f049fbac10fb3d3ffe43d45e83c91b80ae8cd1dafb855ad83c 2025-12-04T15:44:20.2194940Z deleted: sha256:862a98881b1d5adad5c21d01602773b894794097de80964ef8f47bcaadb43255 2025-12-04T15:44:20.2195568Z deleted: sha256:1cf6d3c8b6c2694b79a2d08719594903811c330a36a4c7a8a7153a350b53d292 2025-12-04T15:44:20.2196212Z deleted: sha256:232a1ae8b0fee817ff7838bb5986a2f38377d3b1dbbf5217b576af0f953b0844 2025-12-04T15:44:20.2196883Z deleted: sha256:c72c5705dabd6314423dd7d4fb260a20d5d9886b2ebce60d19e9d78c4a2335c2 2025-12-04T15:44:20.2197702Z deleted: sha256:296734cf81fd92c913884d058908598424ffe072676e38de289bbab83768c7bd 2025-12-04T15:44:20.2198514Z deleted: sha256:7c76040481b889847a1804021aeff07547eaa4ee706d6137db218d497a8fd9c1 2025-12-04T15:44:20.2199234Z deleted: sha256:d5e293f5b354e8cbcc6de893ea72cc632b02d8fdfbb08ec3127c4e9662f3ebff 2025-12-04T15:44:20.2199877Z deleted: sha256:f35a64e429c88e249645090f21fbe7dae108d98e0ab4ea13184f24b3fd66c315 2025-12-04T15:44:20.2200516Z deleted: sha256:ce6ae8d595c8e69115c51b1ce4f9a9158795d7b863b1cb53f21c39a87974d41b 2025-12-04T15:44:20.2201275Z deleted: sha256:8941abaee59400fb9b3a60765fea4a1fc2a6a447467a6d983e84c7f72494a450 2025-12-04T15:44:20.2202323Z deleted: sha256:ef53c29a9a2c2bc80ffdb9bfaf92842436b5755ec1ce828b9d11e5e27d656ea1 2025-12-04T15:44:20.2203134Z deleted: sha256:7a347fb0acb43f1c814f8c8ff21185e8b5cf64d7bc5988cea060f77d906e08b5 2025-12-04T15:44:20.2203933Z deleted: sha256:cc855dc9be79496e15175569dced2d13477e50b077a5fd3945f9bf50018880c1 2025-12-04T15:44:20.2204837Z deleted: sha256:f7a9946ada3d4786658bc0b643808bb32a9a45e4e90e30dc43ee19e2dbe24024 2025-12-04T15:44:20.2205739Z deleted: sha256:c22a9215f62812c1d2e32827f5221ff556c5b6702aadbdab6b87b8293f19635e 2025-12-04T15:44:20.2206538Z deleted: sha256:959a56746620012e37c1def1a83c5afb1e7c0adc59b021a28beb53c24df98032 2025-12-04T15:44:20.2207401Z deleted: sha256:31a0fff0695bf6100c17954be72eab2095b466d559c75c3faf2a17d8c41e6ebe 2025-12-04T15:44:20.2208503Z deleted: sha256:c15e2b5241b9e55af1b2593e544391b4b44d0505e6528e8f12425136e93b424c 2025-12-04T15:44:20.2209297Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b 2025-12-04T15:44:20.2209967Z untagged: public.ecr.aws/docker/library/python:3.13 2025-12-04T15:44:20.2210875Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T15:44:20.2211934Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a 2025-12-04T15:44:20.2212762Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd 2025-12-04T15:44:20.2213574Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67 2025-12-04T15:44:20.2214385Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c 2025-12-04T15:44:20.2215217Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212 2025-12-04T15:44:20.2216035Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318 2025-12-04T15:44:20.2216840Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560 2025-12-04T15:44:20.2217670Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee 2025-12-04T15:44:20.2218159Z 2025-12-04T15:44:20.2218308Z Total reclaimed space: 38.02GB 2025-12-04T15:44:20.2264029Z ##[group]Run set +e 2025-12-04T15:44:20.2264339Z set +e 2025-12-04T15:44:20.2264552Z set -x 2025-12-04T15:44:20.2264765Z  2025-12-04T15:44:20.2264963Z nvidia-smi 2025-12-04T15:44:20.2265411Z # NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in 2025-12-04T15:44:20.2266105Z # the case where the driver has already crashed as it still can get the driver version 2025-12-04T15:44:20.2266783Z # and some basic information like the bus ID. However, the rest of the information 2025-12-04T15:44:20.2267294Z # would be missing (ERR!), for example: 2025-12-04T15:44:20.2267606Z # 2025-12-04T15:44:20.2267897Z # +-----------------------------------------------------------------------------+ 2025-12-04T15:44:20.2268425Z # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | 2025-12-04T15:44:20.2268964Z # |-------------------------------+----------------------+----------------------+ 2025-12-04T15:44:20.2269488Z # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T15:44:20.2270074Z # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T15:44:20.2270551Z # | | | MIG M. | 2025-12-04T15:44:20.2270909Z # |===============================+======================+======================| 2025-12-04T15:44:20.2271320Z # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | 2025-12-04T15:44:20.2271794Z # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | 2025-12-04T15:44:20.2272248Z # | | | ERR! | 2025-12-04T15:44:20.2272689Z # +-------------------------------+----------------------+----------------------+ 2025-12-04T15:44:20.2273064Z # 2025-12-04T15:44:20.2273353Z # +-----------------------------------------------------------------------------+ 2025-12-04T15:44:20.2273903Z # | Processes: | 2025-12-04T15:44:20.2274367Z # | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T15:44:20.2274818Z # | ID ID Usage | 2025-12-04T15:44:20.2275263Z # |=============================================================================| 2025-12-04T15:44:20.2275681Z # +-----------------------------------------------------------------------------+ 2025-12-04T15:44:20.2276050Z # 2025-12-04T15:44:20.2276434Z # This should be reported as a failure instead as it will guarantee to fail when 2025-12-04T15:44:20.2276944Z # Docker tries to run with --gpus all 2025-12-04T15:44:20.2277258Z # 2025-12-04T15:44:20.2277616Z # So, the correct check here is to query one of the missing piece of info like 2025-12-04T15:44:20.2278145Z # GPU name, so that the command can fail accordingly 2025-12-04T15:44:20.2278639Z nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T15:44:20.2279060Z NVIDIA_SMI_STATUS=$? 2025-12-04T15:44:20.2279325Z  2025-12-04T15:44:20.2279772Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T15:44:20.2280437Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T15:44:20.2281030Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T15:44:20.2281542Z  .github/scripts/stop_runner_service.sh 2025-12-04T15:44:20.2281867Z fi 2025-12-04T15:44:20.2282063Z  2025-12-04T15:44:20.2282583Z # For runner with multiple GPUs, we also want to confirm that the number of GPUs are the 2025-12-04T15:44:20.2283211Z # power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails 2025-12-04T15:44:20.2283741Z # https://github.com/pytorch/test-infra/issues/4000 2025-12-04T15:44:20.2284171Z GPU_COUNT=$(nvidia-smi --list-gpus | wc -l) 2025-12-04T15:44:20.2284526Z NVIDIA_SMI_STATUS=$? 2025-12-04T15:44:20.2284795Z  2025-12-04T15:44:20.2285214Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T15:44:20.2285855Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T15:44:20.2286436Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T15:44:20.2286939Z  .github/scripts/stop_runner_service.sh 2025-12-04T15:44:20.2287248Z fi 2025-12-04T15:44:20.2287449Z  2025-12-04T15:44:20.2287689Z # Check the GPU count to be a power of 2 2025-12-04T15:44:20.2288240Z if [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then 2025-12-04T15:44:20.2288985Z  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..." 2025-12-04T15:44:20.2289559Z  .github/scripts/stop_runner_service.sh 2025-12-04T15:44:20.2289882Z fi 2025-12-04T15:44:20.2300596Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:44:20.2300962Z env: 2025-12-04T15:44:20.2301162Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:44:20.2301410Z HAS_NVIDIA_GPU: true 2025-12-04T15:44:20.2301713Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T15:44:20.2302314Z DOCKER_CONTAINER_ID: 5d0babf71ea38114e74fa8d779046640e9a746eb182940c07ee3e84ae026eaf7 2025-12-04T15:44:20.2302816Z ##[endgroup] 2025-12-04T15:44:20.2340785Z + nvidia-smi 2025-12-04T15:44:20.2574496Z Thu Dec 4 15:44:20 2025 2025-12-04T15:44:20.2575023Z +-----------------------------------------------------------------------------------------+ 2025-12-04T15:44:20.2575824Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T15:44:20.2576344Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T15:44:20.2576917Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T15:44:20.2577587Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T15:44:20.2578040Z | | | MIG M. | 2025-12-04T15:44:20.2578398Z |=========================================+========================+======================| 2025-12-04T15:44:20.2791097Z | 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 | 2025-12-04T15:44:20.2791730Z | 0% 21C P8 10W / 300W | 0MiB / 23028MiB | 0% Default | 2025-12-04T15:44:20.2792232Z | | | N/A | 2025-12-04T15:44:20.2792706Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T15:44:20.2796191Z 2025-12-04T15:44:20.2796666Z +-----------------------------------------------------------------------------------------+ 2025-12-04T15:44:20.2797286Z | Processes: | 2025-12-04T15:44:20.2797887Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T15:44:20.2798377Z | ID ID Usage | 2025-12-04T15:44:20.2799008Z |=========================================================================================| 2025-12-04T15:44:20.2803398Z | No running processes found | 2025-12-04T15:44:20.2803935Z +-----------------------------------------------------------------------------------------+ 2025-12-04T15:44:20.5367765Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T15:44:20.5542178Z NVIDIA A10G 2025-12-04T15:44:20.5587818Z + NVIDIA_SMI_STATUS=0 2025-12-04T15:44:20.5588150Z + '[' 0 -ne 0 ']' 2025-12-04T15:44:20.5594968Z ++ nvidia-smi --list-gpus 2025-12-04T15:44:20.5595740Z ++ wc -l 2025-12-04T15:44:20.5820868Z + GPU_COUNT=1 2025-12-04T15:44:20.5821221Z + NVIDIA_SMI_STATUS=0 2025-12-04T15:44:20.5821528Z + '[' 0 -ne 0 ']' 2025-12-04T15:44:20.5821745Z + '[' 1 -le 8 ']' 2025-12-04T15:44:20.5821959Z + '[' 1 -ne 1 ']' 2025-12-04T15:44:20.5889994Z Post job cleanup. 2025-12-04T15:44:20.5966086Z Post job cleanup. 2025-12-04T15:44:20.6010946Z Post job cleanup. 2025-12-04T15:44:20.7058142Z [command]/usr/bin/git version 2025-12-04T15:44:20.7127338Z git version 2.50.1 2025-12-04T15:44:20.7164989Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/11dd59f0-aa5a-483e-a1a9-e62eb03c751e/.gitconfig' 2025-12-04T15:44:20.7173900Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/11dd59f0-aa5a-483e-a1a9-e62eb03c751e' before making global git config changes 2025-12-04T15:44:20.7174873Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T15:44:20.7179352Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T15:44:20.7227686Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T15:44:20.7275423Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T15:44:20.7690309Z Entering 'android/libs/fbjni' 2025-12-04T15:44:20.7773061Z Entering 'third_party/FP16' 2025-12-04T15:44:20.7854134Z Entering 'third_party/FXdiv' 2025-12-04T15:44:20.7946185Z Entering 'third_party/NNPACK' 2025-12-04T15:44:20.8030932Z Entering 'third_party/NVTX' 2025-12-04T15:44:20.8113215Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:44:20.8195100Z Entering 'third_party/XNNPACK' 2025-12-04T15:44:20.8294452Z Entering 'third_party/aiter' 2025-12-04T15:44:20.8376453Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:44:20.8466640Z Entering 'third_party/benchmark' 2025-12-04T15:44:20.8548654Z Entering 'third_party/composable_kernel' 2025-12-04T15:44:20.8639903Z Entering 'third_party/cpp-httplib' 2025-12-04T15:44:20.8720808Z Entering 'third_party/cpuinfo' 2025-12-04T15:44:20.8801595Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:44:20.8882784Z Entering 'third_party/cutlass' 2025-12-04T15:44:20.8977651Z Entering 'third_party/fbgemm' 2025-12-04T15:44:20.9060980Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:44:20.9138383Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:44:20.9224644Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:44:20.9301201Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:44:20.9388901Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:44:20.9473258Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:44:20.9544649Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:44:20.9632900Z Entering 'third_party/flash-attention' 2025-12-04T15:44:20.9713155Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:44:20.9793572Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:44:20.9886504Z Entering 'third_party/flatbuffers' 2025-12-04T15:44:20.9971021Z Entering 'third_party/fmt' 2025-12-04T15:44:21.0051555Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:44:21.0132681Z Entering 'third_party/gloo' 2025-12-04T15:44:21.0213045Z Entering 'third_party/googletest' 2025-12-04T15:44:21.0292463Z Entering 'third_party/ideep' 2025-12-04T15:44:21.0371483Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:44:21.0458439Z Entering 'third_party/ittapi' 2025-12-04T15:44:21.0541359Z Entering 'third_party/kineto' 2025-12-04T15:44:21.0624878Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:44:21.0700094Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:44:21.0786978Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:44:21.0865870Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:44:21.0944262Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:44:21.1024715Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:44:21.1106925Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:44:21.1185413Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:44:21.1266019Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:44:21.1345495Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:44:21.1422779Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:44:21.1501357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:44:21.1583382Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:44:21.1670288Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:44:21.1751210Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:44:21.1834283Z Entering 'third_party/kleidiai' 2025-12-04T15:44:21.1913858Z Entering 'third_party/mimalloc' 2025-12-04T15:44:21.1992008Z Entering 'third_party/nlohmann' 2025-12-04T15:44:21.2072621Z Entering 'third_party/onnx' 2025-12-04T15:44:21.2171176Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:44:21.2256324Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:44:21.2339521Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:44:21.2423309Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:44:21.2500133Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:44:21.2580934Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:44:21.2660997Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:44:21.2738166Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:44:21.2815395Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:44:21.2890811Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:44:21.2970720Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:44:21.3055809Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:44:21.3157378Z Entering 'third_party/pocketfft' 2025-12-04T15:44:21.3236948Z Entering 'third_party/protobuf' 2025-12-04T15:44:21.3318284Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:44:21.3394868Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:44:21.3479178Z Entering 'third_party/psimd' 2025-12-04T15:44:21.3557478Z Entering 'third_party/pthreadpool' 2025-12-04T15:44:21.3635326Z Entering 'third_party/pybind11' 2025-12-04T15:44:21.3714176Z Entering 'third_party/python-peachpy' 2025-12-04T15:44:21.3794184Z Entering 'third_party/sleef' 2025-12-04T15:44:21.3872973Z Entering 'third_party/tensorpipe' 2025-12-04T15:44:21.3953581Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:44:21.4032468Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:44:21.4108867Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:44:21.4186483Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:44:21.4261863Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:44:21.4374500Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T15:44:21.4403537Z http.https://github.com/.extraheader 2025-12-04T15:44:21.4421388Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T15:44:21.4465837Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T15:44:21.4866257Z Entering 'android/libs/fbjni' 2025-12-04T15:44:21.4919279Z http.https://github.com/.extraheader 2025-12-04T15:44:21.4968669Z Entering 'third_party/FP16' 2025-12-04T15:44:21.5021973Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5071744Z Entering 'third_party/FXdiv' 2025-12-04T15:44:21.5126349Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5175274Z Entering 'third_party/NNPACK' 2025-12-04T15:44:21.5229629Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5279825Z Entering 'third_party/NVTX' 2025-12-04T15:44:21.5332422Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5383478Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:44:21.5439733Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5488808Z Entering 'third_party/XNNPACK' 2025-12-04T15:44:21.5541486Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5611877Z Entering 'third_party/aiter' 2025-12-04T15:44:21.5663473Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5714216Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:44:21.5765148Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5829765Z Entering 'third_party/benchmark' 2025-12-04T15:44:21.5881317Z http.https://github.com/.extraheader 2025-12-04T15:44:21.5932646Z Entering 'third_party/composable_kernel' 2025-12-04T15:44:21.5984659Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6044026Z Entering 'third_party/cpp-httplib' 2025-12-04T15:44:21.6099234Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6150083Z Entering 'third_party/cpuinfo' 2025-12-04T15:44:21.6206427Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6257544Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:44:21.6314051Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6365223Z Entering 'third_party/cutlass' 2025-12-04T15:44:21.6416368Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6475465Z Entering 'third_party/fbgemm' 2025-12-04T15:44:21.6528267Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6580987Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:44:21.6628983Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6678190Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:44:21.6728689Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6786669Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:44:21.6839066Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6889342Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:44:21.6940014Z http.https://github.com/.extraheader 2025-12-04T15:44:21.6998731Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:44:21.7048210Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7097541Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:44:21.7149165Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7197227Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:44:21.7249384Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7304327Z Entering 'third_party/flash-attention' 2025-12-04T15:44:21.7356868Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7406425Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:44:21.7462853Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7519201Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:44:21.7569308Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7629783Z Entering 'third_party/flatbuffers' 2025-12-04T15:44:21.7683100Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7734595Z Entering 'third_party/fmt' 2025-12-04T15:44:21.7786906Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7837660Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:44:21.7889657Z http.https://github.com/.extraheader 2025-12-04T15:44:21.7941178Z Entering 'third_party/gloo' 2025-12-04T15:44:21.7992784Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8043600Z Entering 'third_party/googletest' 2025-12-04T15:44:21.8094186Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8144739Z Entering 'third_party/ideep' 2025-12-04T15:44:21.8196510Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8246674Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:44:21.8303224Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8362831Z Entering 'third_party/ittapi' 2025-12-04T15:44:21.8416286Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8467404Z Entering 'third_party/kineto' 2025-12-04T15:44:21.8523839Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8572484Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:44:21.8627908Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8676199Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:44:21.8727804Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8779409Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:44:21.8830109Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8881082Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:44:21.8931358Z http.https://github.com/.extraheader 2025-12-04T15:44:21.8981988Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:44:21.9033254Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9080154Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:44:21.9132211Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9188219Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:44:21.9238995Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9289392Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:44:21.9341384Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9395609Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:44:21.9446166Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9497649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:44:21.9551195Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9601974Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:44:21.9652196Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9702322Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:44:21.9753557Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9806473Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:44:21.9858402Z http.https://github.com/.extraheader 2025-12-04T15:44:21.9916905Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:44:21.9967358Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0016604Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:44:22.0066471Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0121347Z Entering 'third_party/kleidiai' 2025-12-04T15:44:22.0173600Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0228262Z Entering 'third_party/mimalloc' 2025-12-04T15:44:22.0280490Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0334435Z Entering 'third_party/nlohmann' 2025-12-04T15:44:22.0385743Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0439942Z Entering 'third_party/onnx' 2025-12-04T15:44:22.0491241Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0557897Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:44:22.0609016Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0663585Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:44:22.0714923Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0765903Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:44:22.0815853Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0866432Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:44:22.0913712Z http.https://github.com/.extraheader 2025-12-04T15:44:22.0963584Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:44:22.1018662Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1066318Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:44:22.1115514Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1165711Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:44:22.1214552Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1263220Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:44:22.1317379Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1366027Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:44:22.1415382Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1463324Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:44:22.1513922Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1565413Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:44:22.1615747Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1669358Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:44:22.1718040Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1796964Z Entering 'third_party/pocketfft' 2025-12-04T15:44:22.1849018Z http.https://github.com/.extraheader 2025-12-04T15:44:22.1898610Z Entering 'third_party/protobuf' 2025-12-04T15:44:22.1952032Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2007030Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:44:22.2062761Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2112835Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:44:22.2163371Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2217657Z Entering 'third_party/psimd' 2025-12-04T15:44:22.2268914Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2320349Z Entering 'third_party/pthreadpool' 2025-12-04T15:44:22.2373094Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2423961Z Entering 'third_party/pybind11' 2025-12-04T15:44:22.2475223Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2525352Z Entering 'third_party/python-peachpy' 2025-12-04T15:44:22.2576330Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2629999Z Entering 'third_party/sleef' 2025-12-04T15:44:22.2680455Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2730576Z Entering 'third_party/tensorpipe' 2025-12-04T15:44:22.2781742Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2831370Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:44:22.2879249Z http.https://github.com/.extraheader 2025-12-04T15:44:22.2930978Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:44:22.2985265Z http.https://github.com/.extraheader 2025-12-04T15:44:22.3033414Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:44:22.3082755Z http.https://github.com/.extraheader 2025-12-04T15:44:22.3133853Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:44:22.3183551Z http.https://github.com/.extraheader 2025-12-04T15:44:22.3232392Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:44:22.3283562Z http.https://github.com/.extraheader 2025-12-04T15:44:22.3370502Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.3420971Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T15:44:22.3827623Z Entering 'android/libs/fbjni' 2025-12-04T15:44:22.3863426Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T15:44:22.3887938Z Entering 'third_party/FP16' 2025-12-04T15:44:22.3923556Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T15:44:22.3948661Z Entering 'third_party/FXdiv' 2025-12-04T15:44:22.3984155Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T15:44:22.4009440Z Entering 'third_party/NNPACK' 2025-12-04T15:44:22.4043987Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T15:44:22.4070925Z Entering 'third_party/NVTX' 2025-12-04T15:44:22.4107004Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T15:44:22.4134668Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:44:22.4169827Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T15:44:22.4194936Z Entering 'third_party/XNNPACK' 2025-12-04T15:44:22.4230198Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T15:44:22.4271141Z Entering 'third_party/aiter' 2025-12-04T15:44:22.4306167Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T15:44:22.4333550Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:44:22.4367023Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T15:44:22.4401275Z Entering 'third_party/benchmark' 2025-12-04T15:44:22.4436634Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:44:22.4461964Z Entering 'third_party/composable_kernel' 2025-12-04T15:44:22.4496812Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T15:44:22.4530737Z Entering 'third_party/cpp-httplib' 2025-12-04T15:44:22.4567090Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T15:44:22.4592293Z Entering 'third_party/cpuinfo' 2025-12-04T15:44:22.4627648Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T15:44:22.4653822Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:44:22.4688783Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T15:44:22.4714876Z Entering 'third_party/cutlass' 2025-12-04T15:44:22.4749938Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T15:44:22.4784199Z Entering 'third_party/fbgemm' 2025-12-04T15:44:22.4819718Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T15:44:22.4847395Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:44:22.4880004Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T15:44:22.4905432Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:44:22.4938389Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T15:44:22.4971392Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:44:22.5004544Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T15:44:22.5029508Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:44:22.5062003Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T15:44:22.5095903Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:44:22.5128136Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T15:44:22.5151609Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:44:22.5187600Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T15:44:22.5210556Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:44:22.5243889Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T15:44:22.5271172Z Entering 'third_party/flash-attention' 2025-12-04T15:44:22.5306398Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T15:44:22.5331654Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:44:22.5363147Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T15:44:22.5393717Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:44:22.5426952Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T15:44:22.5463584Z Entering 'third_party/flatbuffers' 2025-12-04T15:44:22.5498725Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T15:44:22.5527490Z Entering 'third_party/fmt' 2025-12-04T15:44:22.5562392Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T15:44:22.5587884Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:44:22.5624956Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T15:44:22.5650920Z Entering 'third_party/gloo' 2025-12-04T15:44:22.5686592Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T15:44:22.5712390Z Entering 'third_party/googletest' 2025-12-04T15:44:22.5747106Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:44:22.5772445Z Entering 'third_party/ideep' 2025-12-04T15:44:22.5809550Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T15:44:22.5831544Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:44:22.5867215Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T15:44:22.5901278Z Entering 'third_party/ittapi' 2025-12-04T15:44:22.5937474Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T15:44:22.5962809Z Entering 'third_party/kineto' 2025-12-04T15:44:22.5997466Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T15:44:22.6021378Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:44:22.6055496Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T15:44:22.6078504Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:44:22.6112349Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T15:44:22.6139545Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:44:22.6173864Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T15:44:22.6198565Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:44:22.6232507Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T15:44:22.6257005Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:44:22.6290258Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T15:44:22.6312997Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:44:22.6347486Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T15:44:22.6375944Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:44:22.6409882Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T15:44:22.6433522Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:44:22.6466361Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:44:22.6491369Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:44:22.6530956Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T15:44:22.6556505Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:44:22.6589859Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T15:44:22.6614677Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:44:22.6650338Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T15:44:22.6680134Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:44:22.6706824Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T15:44:22.6734391Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:44:22.6768362Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T15:44:22.6799879Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:44:22.6833122Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T15:44:22.6856907Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:44:22.6891776Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T15:44:22.6919439Z Entering 'third_party/kleidiai' 2025-12-04T15:44:22.6955410Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T15:44:22.6980874Z Entering 'third_party/mimalloc' 2025-12-04T15:44:22.7022014Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T15:44:22.7047187Z Entering 'third_party/nlohmann' 2025-12-04T15:44:22.7082134Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T15:44:22.7109063Z Entering 'third_party/onnx' 2025-12-04T15:44:22.7148486Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T15:44:22.7191814Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:44:22.7226679Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:44:22.7259411Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:44:22.7297102Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T15:44:22.7322016Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:44:22.7353889Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:44:22.7378115Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:44:22.7412749Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:44:22.7439126Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:44:22.7469646Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T15:44:22.7493990Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:44:22.7527720Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T15:44:22.7553482Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:44:22.7588234Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T15:44:22.7611456Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:44:22.7646431Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T15:44:22.7668340Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:44:22.7701351Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T15:44:22.7725118Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:44:22.7759271Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T15:44:22.7785319Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:44:22.7820219Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T15:44:22.7846911Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:44:22.7880145Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T15:44:22.7927987Z Entering 'third_party/pocketfft' 2025-12-04T15:44:22.7963473Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T15:44:22.7989916Z Entering 'third_party/protobuf' 2025-12-04T15:44:22.8025566Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T15:44:22.8051988Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:44:22.8084507Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:44:22.8110771Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:44:22.8144906Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:44:22.8173188Z Entering 'third_party/psimd' 2025-12-04T15:44:22.8209637Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T15:44:22.8236385Z Entering 'third_party/pthreadpool' 2025-12-04T15:44:22.8273618Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T15:44:22.8298736Z Entering 'third_party/pybind11' 2025-12-04T15:44:22.8334491Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:44:22.8360408Z Entering 'third_party/python-peachpy' 2025-12-04T15:44:22.8395640Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T15:44:22.8421658Z Entering 'third_party/sleef' 2025-12-04T15:44:22.8456901Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T15:44:22.8482707Z Entering 'third_party/tensorpipe' 2025-12-04T15:44:22.8518870Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T15:44:22.8543783Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:44:22.8575831Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:44:22.8600490Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:44:22.8634303Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T15:44:22.8657796Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:44:22.8690047Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T15:44:22.8714407Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:44:22.8746796Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:44:22.8768705Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:44:22.8804048Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T15:44:22.8862240Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.8896854Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.8930570Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.8965214Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.8999190Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9033908Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9070123Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9103414Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9137665Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9171181Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9205590Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9241126Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9274701Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9310042Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9343312Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9375098Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9407501Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9440611Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9473011Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9505234Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9539340Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9575372Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9616316Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9649610Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9683030Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9717067Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9750281Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9785662Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9821712Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9854696Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9888213Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9920956Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9954985Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:22.9989294Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0023730Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0057577Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0092467Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0130515Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0165882Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0201633Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0236249Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0271200Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0305476Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0341807Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0375368Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0409036Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0442811Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0476328Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0509519Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0544457Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0578171Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0611203Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0645231Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0678895Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0714320Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0748211Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0782580Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0817562Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0851316Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0886953Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0921621Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0956220Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.0991347Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1029210Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1065278Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1100841Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1137154Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1173831Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1210465Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1244537Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1279572Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1314761Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1349919Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1385561Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1421269Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1455907Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1490615Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1526258Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1561512Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1599070Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1634059Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:44:23.1793604Z A job completed hook has been configured by the self-hosted runner administrator 2025-12-04T15:44:23.1813025Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-12-04T15:44:23.1821253Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:44:23.1821746Z ##[endgroup] 2025-12-04T15:44:31.2428163Z Cleaning up orphan processes